-
Just same as "**A full end-to-end single file deduplication example**" in readme file, When I tried to run "**bash scripts/deduplicate_single_file.sh /home/user/deduplicate-text-datasets/test_reduce/t…
-
Currently, '`porter`' stemmer is used by default for duckdb indexing here https://github.com/huggingface/datasets-server/pull/1296/files#diff-d9a2c828d7feca3b7f9e332e040ef861e842a16d18276b356461d2aa34…
-
I am using a local machine with Ubuntu 22.04 and Python 3.10. Indexing leads to the following every time:
```
08:55:25,379 root ERROR error extracting graph
Traceback (most recent call last):
…
-
Hi, finetuning on flrorence-ft model, the model gets forgetting in old knowledge, (the way we using is not use florence directly, we training it and then adopt the vision encoder to large LLM instead)…
-
### Version
VSCode 1.91.1
Cody Version: 1.26.6 and v1.27.1721488653 (pre-release)
### Describe the bug
Embeddings creation and indexing only progresses single digit percentage then IP blocke…
-
## Long code blocks cause an error in _gen_chars
When using large blocks of code, I experienced an issue where i could not get a string to be accepted, after a bit of experimenting, it seems that aft…
-
❌ create_summarized_entities
None
⠼ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━ 100% 0:0… 0:0…
├── create_base_text_units
├── create_base_extracted_entities
└──…
-
### Description
TL/DR : `index_prefixes` on shingled fields costs a lot and benefits only a little.
Index_prefixes is a useful query-time optimisation but comes with an added cost in the extra dis…
-
**Is your feature request related to a problem? Please describe.**
When trying to store objects into document stores (for example, to make some kind of agent memory) you have to add components to ser…
-
Hello. Now I get the most of the need when the index files such as doclens.10.json,docnos.pkl.gz files But in the last step write in ivfpq.100.faiss file failed So I want to use the obtained file to …