dodeeric / langchain-ai-assistant-with-hybrid-rag

This is a LLM chatbot coded with LangChain. The web interface is coded with Streamlit. It implements a hybrid RAG (keyword and semantic search) and chat memory.
https://bmae-ragai-webapp.azurewebsites.net
GNU General Public License v3.0
8 stars 1 forks source link

ideally, the files storage (json_files, pdf_files, vertexai-sa-key) should be hosted on an external server (for the db it is done already) #99

Closed dodeeric closed 3 days ago

dodeeric commented 6 days ago

when restarting a PaaS service (Azure Web App, Streamlit Community Cloud), the app files are wiped, and a new clone of the git repo is made (all files which were uploaded via the admin interface are then gone).

dodeeric commented 6 days ago

can you tell me how to install and configure nfs-kernel-server? https://chatgpt.com/share/9b85760a-a888-4dfa-95d4-b42d4b9b84b0

dodeeric commented 5 days ago

The code should be adapted to save files (JSON, PDF, sa-key) to an azure sa blob container.

As for the db, two options:

Use blob container in Python from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient https://chatgpt.com/share/eac0c858-d045-4d00-ba0e-185a514c7c0d

dodeeric commented 5 days ago

is there a python library that permits to replace a filesystem directory by an azure blob container transparently? https://chatgpt.com/share/24b2f62a-b7d1-4270-bc9a-6b1d5c4be7c1

dodeeric commented 4 days ago
elif choice == "Embed Pages in DB":

    JSON_FILES_DIR = "./json_files/"
    PDF_FILES_DIR = "./pdf_files/"

    # JSON files
    json_files = os.listdir(JSON_FILES_DIR)  ==> os.listdir has to be changed into an equivalent with azr
    json_paths = []
    for json_file in json_files:
        json_path = f"{JSON_FILES_DIR}{json_file}"
        json_paths.append(json_path)

    # PDF files
    pdf_files = os.listdir(PDF_FILES_DIR)  ==> os.listdir has to be changed into an equivalent with az
    pdf_paths = []
    for pdf_file in pdf_files:
        pdf_path = f"{PDF_FILES_DIR}{pdf_file}"
        pdf_paths.append(pdf_path)

Listing files in the root of the blob container: files = fs.ls(f'/{AZURE_BLO

dodeeric commented 4 days ago

if the local fs has to be used:

1) read the file in az: /container_name/json_files/file_name 2) save (overwrite if needed) it to local fs: /json_files/file_name 3) execute the command (ex.: JSONLoader)

Quid: json_path = /container_name/json_files/file_name or /json_files/file_name?

Solution:

Local FS: /container_name/json_files/file_name Remote AZ: /container_name/json_files/file_name

Create a function to sync/dump AZ: /container_name/json_files/file_name to FS: /container_name/json_files/file_name

==> utils.py: def dump_az_to_fs(az):

dodeeric commented 4 days ago

codespace:

sudo apt-get update sudo apt-get install blobfuse

nano blobfuse.cfg

accountName "bmaeragaisa" accountKey "xxx" containerName "bmae-ragai-blobcontainer" authType Key

or should be:

accountName bmaeragaisa accountKey xxx containerName bmae-ragai-blobcontainer authType Key

chmod 600 blobfuse.cfg

sudo mkdir /mnt/resource/blobfusetmp -p sudo chown codespace:codespace /mnt/resource/blobfusetmp/

sudo blobfuse /home/codespace/mycontainer --tmp-path=/mnt/resource/blobfusetmp/ --config-file=./blobfuse.cfg -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120 --log-level=LOG_DEBUG

Unable to start blobfuse due to a lack of credentials. Please check the readme for valid auth setups. Unmounting blobfuse. Unmounted blobfuse successfully.

dodeeric commented 4 days ago

blobfuse2 on azure vm (Linux Ubunti 22.04.4 LTS / Jammy)

blobfuse2/now 2.3.0 amd64 [installed,local] An user-space filesystem for interacting with Azure Storage

sudo apt-get install libfuse3-dev fuse3 blobfuse2

sudo mkdir -p /mnt/resource/blobfusetmp sudo chown dodeeric:dodeeric /mnt/resource/blobfusetmp/

mkdir mycontainer

nano blobfuse.yaml

logging: type: syslog level: log_debug

components:

libfuse: attribute-expiration-sec: 120 entry-expiration-sec: 120 negative-entry-expiration-sec: 240

file_cache: path: /mnt/resource/blobfusetmp timeout-sec: 120 max-size-mb: 4096

attr_cache: timeout-sec: 7200

azstorage: type: block account-name: bmaeragaisa account-key: xxx endpoint: https://bmaeragaisa.blob.core.windows.net mode: key container: bmae-ragai-blobcontainer

chmod 600 blobfuse.cfg

sudo blobfuse2 mount ./mycontainer --config-file=./blobfuse.yaml

The blob container is mounted on the mycontainer directory, but is only accessible if you are root!

Logs are available in two files: /var/log/blobfuse*.log

df -a blobfuse2 mount list

dodeeric commented 4 days ago

mount the files directory on a azure blob container (not mandatory):

blobfuse2 on azure vm and github codespace (Linux Ubuntu 22.04.4 LTS / Jammy)

sudo wget https://packages.microsoft.com/config/ubuntu/22.04/packages-microsoft-prod.deb sudo dpkg -i packages-microsoft-prod.deb sudo apt-get update sudo apt-get install libfuse3-dev fuse3 blobfuse2

mkdir blobfusetmp mkdir files

nano blobfuse.yaml

logging: type: syslog level: log_debug

components:

libfuse: attribute-expiration-sec: 120 entry-expiration-sec: 120 negative-entry-expiration-sec: 240

file_cache: path: blobfusetmp timeout-sec: 120 max-size-mb: 4096

attr_cache: timeout-sec: 7200

azstorage: type: block account-name: bmaeragaisa account-key: xxx endpoint: https://bmaeragaisa.blob.core.windows.net mode: key container: bmae-ragai-blobcontainer

chmod 600 blobfuse.cfg

blobfuse2 mount ./files --config-file=./blobfuse.yaml

Logs are available in two files: /var/log/blobfuse*.log

df -a blobfuse2 mount list

How to mount an Azure Blob Storage container on Linux with BlobFuse2 https://learn.microsoft.com/en-us/azure/storage/blobs/blobfuse2-how-to-deploy?tabs=Ubuntu

Blobfuse2 Installation https://github.com/Azure/azure-storage-fuse/wiki/Blobfuse2-Installation