SciPhi-AI / R2R

The Elasticsearch for RAG - R2R lets you build, scale, and manage user-facing Retrieval-Augmented Generation applications in production.
https://r2r-docs.sciphi.ai/
MIT License
3.16k stars 228 forks source link

ERROR: Could not consume arg: --config_name=local_ollama #495

Closed fahdmirza closed 1 week ago

fahdmirza commented 2 months ago

I am trying to install this R2R with Ollama locally and following this document : https://r2r-docs.sciphi.ai/cookbooks/local-rag

Could you confirm if this document is up to date and correct because even when followed to the dot it gives errors?

Do we need to clone the git repo to get it working with Ollama?

I already have ollama running on my system.

conda create -n r2r python=3.11 -y && conda activate r2r pip install 'r2r[all]' pip install 'r2r[local-embedding]'

mkdir r2r cd R2R touch local_ollama

-- and then pasted below config in local_ollama file:

{ "embedding": { "provider": "sentence-transformers", "base_model": "all-MiniLM-L6-v2", "base_dimension": 384, "batch_size": 32 }, "eval": { "provider": "local", "frequency": 0.0, "llm":{ "provider": "litellm" } }, "ingestion":{ "excluded_parsers": { "gif": "default", "jpeg": "default", "jpg": "default", "png": "default", "svg": "default", "mp3": "default", "mp4": "default" } } }

and then I ran following command:

python -m r2r.examples.quickstart ingest_as_files --no-media=true --config_name=local_ollama

extractions in t=5.57 seconds. - 2024-06-21 07:39:39,108 r2r.pipes.embedding_pipe - INFO - Fragmented the input document ids into counts as shown: {UUID('f0c63aff-af59-50c9-81fc-2fe55004c771'): 17, UUID('c9bdbac7-0ea3-5c9e-b590-018bd09b127b'): 233, UUID('b722f1ec-b90e-5ed8-b7c8-c768e8b323cb'): 5, UUID('c996e617-88a4-5c65-ab1e-948344b18d27'): 3108, UUID('ba77307d-6c8a-549f-812a-3558697e2842'): 23, UUID('4a4fb848-fc03-5487-a7e5-33c9fdfb73cc'): 31, UUID('1a9d4d3b-bbe9-53b9-8149-67806bdf60f2'): 18, UUID('ef66e5dd-2130-5fd5-9bdd-aa7eff59fda5'): 11, UUID('c5abc0b7-b9e5-54d9-b3d3-fdb14af4d065'): 2094} - 2024-06-21 07:39:40,005 Time taken to ingest files: 31.94 seconds {'processed_documents': ["File 'got.txt' processed successfully.", "File 'aristotle.txt' processed successfully.", "File 'pg_essay_1.html' processed successfully.", "File 'pg_essay_2.html' processed successfully.", "File 'pg_essay_3.html' processed successfully.", "File 'pg_essay_4.html' processed successfully.", "File 'pg_essay_5.html' processed successfully.", "File 'lyft_2021.pdf' processed successfully.", "File 'uber_2021.pdf' processed successfully."], 'skipped_documents': []} ERROR: Could not consume arg: --config_name=local_ollama Usage: quickstart.py ingest_as_files --no-media=true -

emrgnt-cmplxty commented 2 months ago

hey @fahdmirza - thanks for flagging, I will look into reproducing, my guess is that the release has fallen slightly out of date.

emrgnt-cmplxty commented 2 months ago

tested this today, the issue is solved with the latest image, can you confirm?

fahdmirza commented 2 months ago

Now it has this error:

(r2r) Ubuntu@0068-kci-prxmx10127:~/R2R$ python3 -m r2r.examples.quickstart ingest_as_files --no-media=true --config_name=local_ollama 2024-06-23 21:22:45,567 - INFO - r2r.core.providers.vector_db_provider - Initializing VectorDBProvider with config extra_fields={} provider='pgvector' collection_name='demo_vecs'. 2024-06-23 21:22:45,624 - INFO - r2r.core.providers.embedding_provider - Initializing EmbeddingProvider with config extra_fields={'text_splitter': {'type': 'recursive_character', 'chunk_size': 512, 'chunk_overlap': 20}} provider='ollama' base_model='mxbai-embed-large' base_dimension=1024 rerank_model=None rerank_dimension=None rerank_transformer_type=None batch_size=32. 2024-06-23 21:22:46,622 - INFO - r2r.core.providers.llm_provider - Initializing LLM provider with config: extra_fields={} provider='litellm' R2RApp.init, config = <r2r.main.assembly.config.R2RConfig object at 0x7b950ee10e50> ERROR: Could not consume arg: ingest_as_files Usage: quickstart.py ingest_as_files - <group|command> available groups: USER_IDS | default_files | file_tuples | r2r_app | user_ids available commands: analytics | app_settings | delete | document_chunks | documents_overview | evaluate | ingest_documents | ingest_files | logs | rag | search | serve | update_documents | update_files | users_overview

For detailed information on this command, run: quickstart.py ingest_as_files - --help

emrgnt-cmplxty commented 2 months ago

Hey Fahd,

Can you confirm whether or not you are still seeing issues after the latest docker has been published today?