Hello, great poster at NuerIPS and it was good to meet you all!
I have some custom docx files (pdfs that I converted to docx with adobe), that I am trying to extract text from. I am able to get the docker file up and running, and I've modified run_single_node.sh to run just the annotation on my_docxs.tar.gz in the data folder. The script seems to execute, but I don't see anything in failed or extracted text. What am I doing wrong? I've pasted the whole log below, and I've also tried a tar of just a simple docx with random text in to verify it's not my converted files causing the issue.
Lastly, a demo for just using a personal set of docxs that works for you would be very helpful in debugging.
Hello, great poster at NuerIPS and it was good to meet you all!
I have some custom docx files (pdfs that I converted to docx with adobe), that I am trying to extract text from. I am able to get the docker file up and running, and I've modified run_single_node.sh to run just the annotation on my_docxs.tar.gz in the data folder. The script seems to execute, but I don't see anything in failed or extracted text. What am I doing wrong? I've pasted the whole log below, and I've also tried a tar of just a simple docx with random text in to verify it's not my converted files causing the issue.
Lastly, a demo for just using a personal set of docxs that works for you would be very helpful in debugging.
Thanks, Matt Olson