Open SURUIYUAN opened 3 weeks ago
The error "EmptyDocsError: Not gathering evidence due to having no papers." appears because the gather_evidence
method in the GatherEvidence
class checks if there are any documents present in the state.docs.docs
. If no documents are found, it raises the EmptyDocsError
. This indicates that no papers have been detected or added to the state before calling the gather_evidence
method [1].
To resolve this, ensure that papers are correctly added to state.docs.docs
using methods like aadd_texts
in the Docs
class. This can be done through the paper_search
function or by directly adding documents to the Docs
object [1][2].
To continue talking to Dosu, mention @dosu.
How to modify in python program, explain in detail.
To ensure that papers are correctly added to state.docs.docs
and avoid the "EmptyDocsError," you need to verify and possibly modify the paper_search
method in the PaperSearch
class. Here is a detailed explanation of the necessary steps:
Ensure paper_search
Method Updates state.docs.docs
:
The paper_search
method should correctly update the state.docs.docs
attribute by adding documents using the aadd_texts
method. Here is the relevant part of the code:
async def paper_search(
self,
query: str,
min_year: int | None,
max_year: int | None,
state: EnvironmentState,
) -> str:
# ... (other code)
# combine all the resulting doc objects into one and update the state
all_docs = []
for r in results:
this_doc = next(iter(r.docs.values()))
all_docs.append(this_doc)
await state.docs.aadd_texts(
texts=r.texts,
doc=this_doc,
settings=self.settings,
embedding_model=self.embedding_model,
)
status = state.status
logger.info(status)
# mark how far we've searched so that continuation will start at the right place
self.previous_searches[search_key] += self.settings.agent.search_count
if self.settings.agent.return_paper_metadata:
retrieved_papers = "\n".join([f"{x.title} ({x.year})" for x in all_docs])
return f"Retrieved Papers:\n{retrieved_papers}\n\n{status}"
return status
Verify results
from index.query
:
Ensure that the results
from the index.query
call are not empty. If results
is empty, no documents will be added to state.docs.docs
, leading to the "EmptyDocsError".
Check aadd_texts
Method:
The aadd_texts
method in the Docs
class should correctly add texts to the docs
dictionary. Here is the method signature and its parameters:
async def aadd_texts(
self,
texts: list[Text],
doc: Doc,
settings: MaybeSettings = None,
embedding_model: EmbeddingModel | None = None,
) -> bool:
Ensure that this method is functioning correctly and that the texts
and doc
parameters are properly passed.
Ensure Docs
Class is Correctly Implemented:
The Docs
class should manage the docs
dictionary and other related attributes properly. Here is a snippet of the Docs
class for reference:
class Docs(BaseModel):
"""A collection of documents to be used for answering questions."""
model_config = ConfigDict(extra="forbid")
id: UUID = Field(default_factory=uuid4)
docs: dict[DocKey, Doc | DocDetails] = Field(default_factory=dict)
texts: list[Text] = Field(default_factory=list)
docnames: set[str] = Field(default_factory=set)
texts_index: VectorStore = Field(default_factory=NumpyVectorStore)
name: str = Field(default="default", description="Name of this docs collection")
index_path: Path | None = Field(
default=PAPERQA_DIR, description="Path to save index", validate_default=True
)
deleted_dockeys: set[DocKey] = Field(default_factory=set)
@field_validator("index_path")
@classmethod
def handle_default(cls, value: Path | None, info: ValidationInfo) -> Path | None:
if value == PAPERQA_DIR:
return PAPERQA_DIR / info.data["name"]
return value
def clear_docs(self) -> None:
self.texts = []
self.docs = {}
self.docnames = set()
def _get_unique_name(self, docname: str) -> str:
"""Create a unique name given proposed name."""
suffix = ""
while (docname + suffix) in self.docnames:
suffix = "a" if suffix == "" else chr(ord(suffix) + 1)
docname += suffix
return docname
async def aadd_file(
self,
file: BinaryIO,
citation: str | None = None,
docname: str | None = None,
dockey: DocKey | None = None,
title: str | None = None,
doi: str | None = None,
authors: list[str] | None = None,
settings: MaybeSettings = None,
llm_model: LLMModel | None = None,
embedding_model: EmbeddingModel | None = None,
**kwargs,
) -> str | None:
"""Add a document to the collection."""
# just put in temp file and use existing method
suffix = ".txt"
if maybe_is_pdf(file):
suffix = ".pdf"
elif maybe_is_html(file):
suffix = ".html"
with tempfile.NamedTemporaryFile(suffix=suffix) as f:
f.write(file.read())
f.seek(0)
return await self.aadd(
Path(f.name),
citation=citation,
docname=docname,
dockey=dockey,
title=title,
doi=doi,
authors=authors,
settings=settings,
llm_model=llm_model,
embedding_model=embedding_model,
**kwargs,
)
async def aadd(
self,
path: Path,
citation: str | None = None,
docname: str | None = None,
dockey: DocKey | None = None,
title: str | None = None,
doi: str | None = None,
authors: list[str] | None = None,
settings: MaybeSettings = None,
llm_model: LLMModel | None = None,
embedding_model: EmbeddingModel | None = None,
**kwargs,
) -> str | None:
"""Add a document to the collection."""
all_settings = get_settings(settings)
parse_config = all_settings.parsing
if dockey is None:
dockey = md5sum(path)
if llm_model is None:
llm_model = all_settings.get_llm()
if citation is None:
texts = read_doc(
path,
Doc(docname="", citation="", dockey=dockey), # Fake doc
chunk_chars=parse_config.chunk_size,
overlap=parse_config.overlap,
)
if not texts:
raise ValueError(f"Could not read document {path}. Is it empty?")
result = await llm_model.run_prompt(
prompt=parse_config.citation_prompt,
data={"text": texts[0].text},
skip_system=True,
)
citation = result.text
if (
len(citation) < 3
or "Unknown" in citation
or "insufficient" in citation
):
citation = f"Unknown, {os.path.basename(path)}, {datetime.now().year}"
if docname is None:
match = re.search(r"([A-Z][a-z]+)", citation)
if match is not None:
author = match.group(1)
else:
raise ValueError(
f"Could not parse docname from citation {citation}. "
"Consider just passing key explicitly - e.g. docs.py "
"(path, citation, key='mykey')"
)
year = ""
match = re.search(r"(\d{4})", citation)
if match is not None:
year = match.group(1)
docname = f"{author}{year}"
docname = self._get_unique_name(docname)
doc = Doc(docname=docname, citation=citation, dockey=dockey)
self.docs[dockey] = doc
self.docnames.add(docname)
self.texts.extend(texts)
self._build_texts_index()
return docname
By ensuring that the paper_search
method correctly updates state.docs.docs
and verifying the implementation of the Docs
class and its methods, you can avoid the "EmptyDocsError" and ensure that papers are correctly detected and added to the state [1][2].
To continue talking to Dosu, mention @dosu.
Why show ”EmptyDocsError: Not gathering evidence due to having no papers.“ It can not detect my paper?