While running detailed reports on local data. No references are returned in the final report.
The Problem:
References are based on visited_urls and, therefore, are not updated with local data documents.
The Candidate Solution:
A candidate solution would be to track references from within ContextCompressor, which is the "source of truth" for content inclusion in the research document.
To enable this a new variable self.unique_documets_visited that defaults to set() could be added to compression.py.
Add the below single line of code can be added to get_context() to track the unique documents visited.
self.unique_documets_visited.update(doc.metadata.get('source')
for i, doc in enumerate(relevant_docs) if i < max_results)
Considerations:
Use this candidate solution for local documents only.
Use this candidate solution for all visisted_urls (Local, Web, and Custom).
Note: Although all scraped URLs are currently included as references to the final document, it is possible that a site was scraped but ultimately not used when considering max_results in get_context().
This can lead to false positive references.
Do not use this candidate solution, as the reference documents are already known.
Use some other solution TBD.
Some other solution is already in the works.
Example:
The below shows an example output
My DOC_PATH has 25 papers; however, in this particular report, only eight papers were selected.
I can make a PR once the desired approach is determined.
I am looking forward to any comments folks may have.
User Experience:
The Problem:
visited_urls
and, therefore, are not updated with local data documents.The Candidate Solution:
A candidate solution would be to track references from within
ContextCompressor,
which is the "source of truth" for content inclusion in the research document.To enable this a new variable
self.unique_documets_visited
that defaults toset()
could be added tocompression.py
.Add the below single line of code can be added to
get_context()
to track the unique documents visited.Considerations:
max_results
inget_context().
Example:
I can make a PR once the desired approach is determined. I am looking forward to any comments folks may have.
Thanks, -Dan