AnswerDotAI / RAGatouille

Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Apache License 2.0
3.03k stars 206 forks source link

Turn Off return of the verbose unstructured text from --> result = RAG.search(query=query,k=4) #241

Open quantumalchemy opened 2 months ago

quantumalchemy commented 2 months ago

How to turn off return of -- the verbose unstructured text --> using --> result = RAG.search(query=query,k=4) and just Return the dict -> [{"content": ...

[Aug 18, 16:57:05] Loading segmented_maxsim_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)... Loading searcher for index ragcorpus for the first time... This may take a few seconds [Aug 18, 16:57:06] #> Loading codec... [Aug 18, 16:57:06] #> Loading IVF... [Aug 18, 16:57:06] Loading segmented_lookup_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)... [Aug 18, 16:57:06] #> Loading doclens... [Aug 18, 16:57:06] #> Loading codes and residuals... [Aug 18, 16:57:06] Loading filter_pids_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)... [Aug 18, 16:57:06] Loading decompress_residuals_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)... Searcher loaded!

> QueryTokenizer.tensorize(batch_text[0], batch_background[0], bsize) ==

> Input: . who is Forneus?, True, None

> Output IDs: torch.Size([32]), tensor([ 101, 1, 2040, 2003, 2005, 2638, 2271, 1029, 102, 103, 103, 103,

     103,  103,  103,  103,  103,  103,  103,  103,  103,  103,  103,  103,
     103,  103,  103,  103,  103,  103,  103,  103])

> Output Mask: torch.Size([32]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

    0, 0, 0, 0, 0, 0, 0, 0])

I just want this --> [{"content": ...

abdelkareemkobo commented 2 months ago

The code about this is in the index.py file which is logging these logs. I tried to set

verbose=1# -1,0,False,3 

After reading the verbsose code from Colbert which saying that verbose >1 will show more logs and verbse=1 will filter the logs, so as my search this is most filtered format you can get with the following codebase.