gabrielchua / RAGxplorer

Open-source tool to visualise your RAG 🔮
MIT License
1.08k stars 95 forks source link

Create tutorials #29

Closed gabrielchua closed 8 months ago

gabrielchua commented 9 months ago

Jupyter Notebook style

alhridoy commented 9 months ago

Hi @gabrielchua , Can you guide me a little on this. I would love to work on this.

gabrielchua commented 9 months ago

Hey @alhridoy,

Thanks for offering.

How about starting with this code snippet? You can make a starter notebook which some examples

from ragxplorer import RAGxplorer

client = RAGxplorer(embedding_model="thenlper/gte-large") 

client.load_pdf("presentation.pdf", verbose=True)

client.visualize_query("What are the top revenue drivers for Microsoft?")

Then you can try changing the following:

  1. When initializing the RAGxplorer object, you can change the embedding_model argument to different embedding models from HuggingFace (e.g. BAAI/bge-large-en) or OpenAI (e.g. the new text-embedding-3-small.

  2. visualize_query method has the following arguments:

    • retrieval_method which can be: naive (default), HyDE or multi_qns
    • top_k which is an int (recommend 3 to 10), defaults to 5.

Feel free to ping here if you run into any issues.

alhridoy commented 9 months ago

Hi @gabrielchua when i write import plotly.graph_objs as go the following errors apper, i also installed plotly manually but it does not fix my problem. ModuleNotFoundError Traceback (most recent call last) Cell In[20], line 1 ----> 1 from ragxplorer import RAGxplorer

File ~/Desktop/projects/RAGxplorer/ragxplorer/init.py:7 1 """ 2 init.py 3 4 Initializes the ragxplorer package and exposes the main classes and functions. 5 """ ----> 7 from .ragxplorer import RAGxplorer 9 all = ['RAGxplorer']

File ~/Desktop/projects/RAGxplorer/ragxplorer/ragxplorer.py:19 11 import pandas as pd 13 from chromadb.utils.embedding_functions import ( 14 SentenceTransformerEmbeddingFunction, 15 OpenAIEmbeddingFunction, 16 HuggingFaceEmbeddingFunction 17 ) ---> 19 import plotly.graph_objs as go 21 from .rag import ( 22 build_vector_database, 23 get_doc_embeddings, 24 get_docs, 25 query_chroma 26 ) 28 from .projections import ( 29 set_up_umap, 30 get_projections, 31 prepare_projections_df, 32 plot_embeddings 33 )

ModuleNotFoundError: No module named 'plotly' . Any idea?

gabrielchua commented 9 months ago

Hi @alhridoy

Are you using a virtual environment? Could you run pip freeze and provide the results?

alhridoy commented 9 months ago

Hi @gabrielchua Thanks. Yes i used virtual environment. Here is the result of pip freeze.

absl-py==2.0.0 abstract_singleton==1.0.1 affine==2.4.0 aiofiles==23.2.1 aiohttp==3.8.4 aiosignal==1.3.1 anthropic==0.3.6 anyio==3.7.1 appdirs==1.4.4 appnope==0.1.3 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 arrow==1.3.0 asciitree==0.3.3 asgiref==3.7.2 astor==0.8.1 asttokens==2.4.1 astunparse==1.6.3 async-generator==1.10 async-lru==2.0.4 async-timeout==4.0.2 asynctest==0.13.0 attr==0.3.2 attrs==23.1.0 auto_gpt_plugin_template==0.0.3 autoflake==2.1.1 autopep8==2.0.2 Babel==2.14.0 backcall==0.2.0 backoff==2.2.1 bcrypt==4.0.1 beautifulsoup4==4.12.2 bert4keras==0.11.4 black==23.3.0 bleach==6.1.0 blessed==1.20.0 blis==0.7.9 boltons==21.0.0 bracex==2.4 cachetools==5.3.0 camel-converter==3.0.3 catalogue==2.0.8 certifi==2023.11.17 cffi==1.15.1 cfgv==3.3.1 channels==4.0.0 chardet==5.1.0 charset-normalizer==3.1.0 chroma-hnswlib==0.7.3 chromadb==0.4.15 click==8.1.3 click-option-group==0.5.6 click-plugins==1.1.1 cligj==0.7.2 cloudpickle==2.2.1 colorama==0.4.6 coloredlogs==15.0.1 comm==0.2.1 confection==0.0.4 contourpy==1.1.1 coverage==7.2.3 crcmod==1.7 cryptography==40.0.2 cssselect==1.2.0 cymem==2.0.7 dataclasses-json==0.5.14 debugpy==1.8.0 decorator==5.1.1 defusedxml==0.7.1 Deprecated==1.2.14 dill==0.3.1.1 distlib==0.3.6 distro==1.8.0 Django==4.2.2 dnspython==2.3.0 docker==6.0.1 docopt==0.6.2 docutils==0.18.1 duckduckgo-search==2.8.6 earthengine-api==0.1.374 en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl et-xmlfile==1.1.0 exceptiongroup==1.1.1 executing==2.0.0 expecttest==0.1.6 face==22.0.0 faiss-cpu==1.7.4 fastapi==0.104.1 fastavro==1.8.4 fasteners==0.19 fastjsonschema==2.18.1 filelock==3.12.0 fiona==1.9.5 fire==0.4.0 flake8==6.0.0 flatbuffers==23.5.26 fqdn==1.5.1 frozenlist==1.3.3 fsspec==2023.9.2 geopandas==0.14.0 ghp-import==2.1.0 git-python==1.0.3 gitdb==4.0.10 GitPython==3.1.31 glom==22.1.0 google-api-core==2.11.0 google-api-python-client==2.86.0 google-auth==2.17.3 google-auth-httplib2==0.1.0 google-cloud-core==2.3.3 google-cloud-storage==2.11.0 google-crc32c==1.5.0 google-resumable-media==2.6.0 googleapis-common-protos==1.59.0 greenlet==3.0.0 grpcio==1.59.0 grpcio-tools==1.59.0 gTTS==2.3.1 h11==0.14.0 h2==4.1.0 h5py==3.8.0 hpack==4.0.0 httpcore==0.17.0 httplib2==0.22.0 httptools==0.6.1 httpx==0.24.0 huggingface-hub==0.16.4 humanfriendly==10.0 hyperframe==6.0.1 hypothesis==6.88.1 identify==2.5.22 idna==3.4 imagesize==1.4.1 immutabledict==3.0.0 importlib-metadata==6.8.0 importlib-resources==6.1.0 iniconfig==2.0.0 inquirer==3.1.3 ipykernel==6.29.0 ipython==8.20.0 ipywidgets==8.1.1 isodate==0.6.1 isoduration==20.11.0 isort==5.12.0 jedi==0.19.1 Jinja2==3.1.2 joblib==1.3.2 json5==0.9.14 jsonpointer==2.4 jsonschema==4.19.1 jsonschema-spec==0.2.4 jsonschema-specifications==2023.7.1 jupyter==1.0.0 jupyter-console==6.6.3 jupyter-events==0.9.0 jupyter-lsp==2.2.2 jupyter_client==8.6.0 jupyter_core==5.7.1 jupyter_server==2.12.5 jupyter_server_terminals==0.5.2 jupyterlab==4.0.11 jupyterlab-widgets==3.0.9 jupyterlab_pygments==0.3.0 jupyterlab_server==2.25.2 Keras==2.3.1 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.2 kubernetes==28.1.0 lancedb==0.1.16 langchain==0.0.231 langchainplus-sdk==0.0.20 langcodes==3.3.0 lazy-object-proxy==1.9.0 libcst==1.0.1 litellm==0.1.824 locket==1.0.0 loguru==0.6.0 lxml==4.9.2 Markdown==3.3.7 markdown-it-py==3.0.0 MarkupSafe==2.1.2 marshmallow==3.20.1 matplotlib-inline==0.1.6 mccabe==0.7.0 mdurl==0.1.2 meilisearch==0.21.0 mergedeep==1.3.4 -e git+https://github.com/geekan/metagpt@ee4d59cd396813be5e5fb674f9c7a40184ad86c9#egg=metagpt mistune==3.0.2 mkdocs==1.4.2 monotonic==1.6 more-itertools==10.1.0 mpmath==1.3.0 multidict==6.0.4 murmurhash==1.0.9 mypy-extensions==1.0.0 nbclient==0.9.0 nbconvert==7.14.2 nbformat==5.9.2 nest-asyncio==1.5.8 networkx==3.1 nltk==3.8.1 nodeenv==1.7.0 notebook==7.0.7 notebook_shim==0.2.3 numcodecs==0.12.0 numexpr==2.8.7 numpy==1.24.3 oauthlib==3.2.2 objsize==0.6.1 onnxruntime==1.16.1 open-interpreter==0.1.7 openai==0.28.1 openapi-core==0.18.1 openapi-python-client==0.13.4 openapi-schema-pydantic==1.2.4 openapi-schema-validator==0.6.2 openapi-spec-validator==0.6.0 openpyxl==3.1.2 opentelemetry-api==1.20.0 opentelemetry-exporter-otlp-proto-common==1.20.0 opentelemetry-exporter-otlp-proto-grpc==1.20.0 opentelemetry-proto==1.20.0 opentelemetry-sdk==1.20.0 opentelemetry-semantic-conventions==0.41b0 optree==0.9.2 orjson==3.9.8 outcome==1.2.0 overrides==7.4.0 packaging==23.1 pandas==2.0.3 pandocfilters==1.5.1 parse==1.19.1 parso==0.8.3 pathable==0.4.3 pathspec==0.11.1 pathy==0.10.1 peewee==3.17.0 pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.5.0 pinecone-client==2.2.1 platformdirs==3.2.0 playsound==1.2.2 plotly==5.18.0 pluggy==1.0.0 portalocker==2.8.2 posthog==3.0.2 prance==23.6.21.0 pre-commit==3.2.2 preshed==3.0.8 prometheus-client==0.19.0 prompt-toolkit==3.0.43 proto-plus==1.22.3 protobuf==4.22.3 psutil==5.9.5 ptyprocess==0.7.0 pulsar-client==3.3.0 pure-eval==0.2.2 py==1.11.0

gabrielchua commented 9 months ago

Do you mind creating a new virtualenv and just do pip install ragxplorer and pip install jupyterlab

sdspieg commented 9 months ago

I'd like to help too. Could somebody who's been working on this (@alhridoy ?) already share their ipynb so that we don't have to 'relearn' what they have already learned?

sdspieg commented 8 months ago

Thanks for adding the ipynb! Just a few points/request:

Just suggestions!