IntelLabs / RAG-FiT

Framework for enhancing LLMs for RAG tasks using fine-tuning.
https://intellabs.github.io/RAG-FiT/
Apache License 2.0
505 stars 36 forks source link

No Data Found in Qdrant: Troubleshooting Empty Index #6

Closed yoyoyo2025 closed 2 months ago

yoyoyo2025 commented 3 months ago

I write a new yaml for retrieval:

Retrieval Augmentation Configuration

name: retrieval_augmentation_experiment cache: false output_path: .

steps:

However, when I open http://127.0.0.1:6333/collections/train, it shows: { "result": { "status": "green", "optimizer_status": "ok", "indexed_vectors_count": 0, "points_count": 0, "segments_count": 8, "config": { "params": { "vectors": { "size": 768, "distance": "Dot", "on_disk": false }, "shard_number": 1, "replication_factor": 1, "write_consistency_factor": 1, "on_disk_payload": true }, "hnsw_config": { "m": 16, "ef_construct": 100, "full_scan_threshold": 10000, "max_indexing_threads": 0, "on_disk": false }, "optimizer_config": { "deleted_threshold": 0.2, "vacuum_min_vector_number": 1000, "default_segment_number": 0, "max_segment_size": null, "memmap_threshold": null, "indexing_threshold": 20000, "flush_interval_sec": 5, "max_optimization_threads": null }, "wal_config": { "wal_capacity_mb": 32, "wal_segments_ahead": 0 }, "quantization_config": null }, "payload_schema": {} }, "status": "ok", "time": 0.000126005 }

I've noticed that there appears to be no data stored in my Qdrant instance, and I'm unsure of the cause. Could you provide some guidance on how to troubleshoot this issue? Thank you very much.

danielfleischer commented 3 months ago

Hi, you need to put some data in Qdrant, it's a DB. See my comment regarding indexing a corpus.

If you want a simpler example, without the use of a corpus, see the Pubmed Tutorial.