-
I'm having issues with the docx parser skipping over content when it's placed in the docx file in separate runs, but appears in the same line or paragraph.
In following example I'm looking for
Pa…
-
Allow .docx files to be parsed with a good "first guess" based on word document headings
-
To support toxicity use cases, can we incorporate the GHS classification into ROBOKOP?
Initial Thoughts:
1) PubChem is massive
2) We do not have a current parser for PubChem
3) PubChem is downlo…
-
Hey,
I am getting this issue sometimes while converting from HTML to docs.
![Screenshot 2022-03-30 at 10 30 55 PM](https://user-images.githubusercontent.com/60562606/161220933-c7e2f0ef-8077-420d-aea…
-
@totravel,
Is there conversion library function? Maybe it's about WinAPI, but I'm not sure. The situation is this: user can specify (.doc, .docx) files, parser is only .docx
-
memary currently parses the agents' responses, which are stored in a .txt file, before inserting them into our knowledge graphs.
As we look to support agentic systems running real-world tasks, our…
-
Thank you so much for an awesome library. While writing a wrapper for readpst for Apache Tika, we noticed a small number of cases where there were fewer attachments when selecting the .msg output opti…
-
Can multi-layer file nesting be handled? For example, if a Word file or a PPT file contains a PNG image, can the content of the PNG be extracted? Does compressed file include compressed files?
-
Hello,
I have been using `combine_pdf` for quite some time and it worked great.
And recently I've found a valid docx file which seems to break CombinePDF parser, it goes into SystemStackError. I w…
-
I was trying out the tutorial. However, when partitioning the PDF provided in tutorial, I did not observe that the font-style of the text being stored in the Metadata for the element.
Is the font-s…