-
At present, the indexing process extracts full text from Doc and PDF files. This can be a slow and expensive process that can cause problems during reindexing. We should cache the extracted text from …
-
### Description of the bug
here is original pdf
[1832786.pdf](https://github.com/user-attachments/files/16929296/1832786.pdf)
image generated by get_pixmap()
![1832786 pdf_0](https://github.com/us…
-
Received feedback from an AIML Specialist SE:
> Given we have a limitation on how many documents can be processed in a single query when using the PREDICT! Function, can we update the quickstart to…
-
Hello @kermitt2 ,
I've remarked that from the extracted TEI, the copyright statement found under availablity tag is actually the publisher, is there any reason for this :
https://github.com/ker…
-
### What problem does your feature request solve?
Currently it uses the grid coordinates, while the documentation in rose edit doesn't make it clear whether real world coordinates, or grid coordi…
-
Hi, When I exacted brain using `antspynet.utilities.brain_extraction` according to AntsPyNet document (https://antsx.github.io/ANTsPyNet/docs/build/html/utilities.html#applications), an error happened…
-
### What problem are you trying to solve?
Currently, data is being extracted from the DOM using JavaScript, which can be inefficient and slow, especially for complex or large documents. This method m…
-
I get the following output when I run this code:
```
documents = LlamaParse(
result_type="json",
split_by_page=False,
parsing_instruction=extraction_instructions
…
-
znmeb updated
5 years ago
-
Hi folks! Love Verba, does the project support or plan to support pluggable retrievers? We are building an open-source reliable extraction and embedding engine - https://getindexify.ai We are pan on s…