Closed yoyoyo2025 closed 2 months ago
I write a new yaml for retrieval:
name: retrieval_augmentation_experiment cache: false output_path: .
steps:
target: ragfoundry.processing.dataset_loaders.loaders.LocalLoader inputs: train filename: mypath/rag/RAGFoundry-main/data/train.jsonl
target: ragfoundry.processing.dataset_loaders.loaders.LocalLoader inputs: test filename: mypath/rag/RAGFoundry-main/data/test.jsonl
target: ragfoundry.processing.local_steps.retrievers.haystack.HaystackRetriever inputs: [train,test] pipeline_or_yaml_path: mypath/rag/RAGFoundry-main/configs/external/haystack/qdrant.yaml docs_key: positive_passages query_key: text1 retriever_index: train
target: ragfoundry.processing.local_steps.context.ContextHandler inputs: [test] docs_key: positive_passages
target: ragfoundry.processing.local_steps.prompter.TextPrompter inputs: test prompt_file: mypath/rag/RAGFoundry-main/configs/prompts/qa-short.txt output_key: prompt mapping: description: text1 context: positive_passages
target: ragfoundry.processing.global_steps.output.OutputData inputs: test prefix: retrieval_augmentation_output
However, when I open http://127.0.0.1:6333/collections/train, it shows: { "result": { "status": "green", "optimizer_status": "ok", "indexed_vectors_count": 0, "points_count": 0, "segments_count": 8, "config": { "params": { "vectors": { "size": 768, "distance": "Dot", "on_disk": false }, "shard_number": 1, "replication_factor": 1, "write_consistency_factor": 1, "on_disk_payload": true }, "hnsw_config": { "m": 16, "ef_construct": 100, "full_scan_threshold": 10000, "max_indexing_threads": 0, "on_disk": false }, "optimizer_config": { "deleted_threshold": 0.2, "vacuum_min_vector_number": 1000, "default_segment_number": 0, "max_segment_size": null, "memmap_threshold": null, "indexing_threshold": 20000, "flush_interval_sec": 5, "max_optimization_threads": null }, "wal_config": { "wal_capacity_mb": 32, "wal_segments_ahead": 0 }, "quantization_config": null }, "payload_schema": {} }, "status": "ok", "time": 0.000126005 }
I've noticed that there appears to be no data stored in my Qdrant instance, and I'm unsure of the cause. Could you provide some guidance on how to troubleshoot this issue? Thank you very much.
Hi, you need to put some data in Qdrant, it's a DB. See my comment regarding indexing a corpus.
If you want a simpler example, without the use of a corpus, see the Pubmed Tutorial.
I write a new yaml for retrieval:
Retrieval Augmentation Configuration
name: retrieval_augmentation_experiment cache: false output_path: .
steps:
target: ragfoundry.processing.dataset_loaders.loaders.LocalLoader inputs: train filename: mypath/rag/RAGFoundry-main/data/train.jsonl
target: ragfoundry.processing.dataset_loaders.loaders.LocalLoader inputs: test filename: mypath/rag/RAGFoundry-main/data/test.jsonl
target: ragfoundry.processing.local_steps.retrievers.haystack.HaystackRetriever inputs: [train,test] pipeline_or_yaml_path: mypath/rag/RAGFoundry-main/configs/external/haystack/qdrant.yaml docs_key: positive_passages query_key: text1
retriever_index: train
target: ragfoundry.processing.local_steps.context.ContextHandler inputs: [test]
docs_key: positive_passages
target: ragfoundry.processing.local_steps.prompter.TextPrompter inputs: test prompt_file: mypath/rag/RAGFoundry-main/configs/prompts/qa-short.txt output_key: prompt mapping: description: text1
context: positive_passages
target: ragfoundry.processing.global_steps.output.OutputData inputs: test prefix: retrieval_augmentation_output
However, when I open http://127.0.0.1:6333/collections/train, it shows: { "result": { "status": "green", "optimizer_status": "ok", "indexed_vectors_count": 0, "points_count": 0, "segments_count": 8, "config": { "params": { "vectors": { "size": 768, "distance": "Dot", "on_disk": false }, "shard_number": 1, "replication_factor": 1, "write_consistency_factor": 1, "on_disk_payload": true }, "hnsw_config": { "m": 16, "ef_construct": 100, "full_scan_threshold": 10000, "max_indexing_threads": 0, "on_disk": false }, "optimizer_config": { "deleted_threshold": 0.2, "vacuum_min_vector_number": 1000, "default_segment_number": 0, "max_segment_size": null, "memmap_threshold": null, "indexing_threshold": 20000, "flush_interval_sec": 5, "max_optimization_threads": null }, "wal_config": { "wal_capacity_mb": 32, "wal_segments_ahead": 0 }, "quantization_config": null }, "payload_schema": {} }, "status": "ok", "time": 0.000126005 }
I've noticed that there appears to be no data stored in my Qdrant instance, and I'm unsure of the cause. Could you provide some guidance on how to troubleshoot this issue? Thank you very much.