Closed mbanon closed 3 years ago
I ran another instance of Bitextor that created almost 9M files...
And the suspicious command , caught in htop:
(They appear here and there, each one running for a few seconds)
Hi @mbanon please help to update the source code and reinstall PDFExtract.jar for resolve the issue.
Thanks
Ok, I applied the fix and it's running now.
Will close when Bitextor finishes (with no trash files :) ) Thanks!
Hi, I've been running Bitextor, branch snake_performance (mentioning @lpla in case he is needed here)
After it finished, I noticed that 2.162.294 (!!!) files, with names being
nonsense-{numbers}.png
, appeared in my crawling directory:After some investigation, I found this line: https://github.com/bitextor/pdf-extract/blob/d4fe244408c55c1b881e62ccee75780e74930dda/src/pdfextract/PDFToHtml.java#L194 , that is suspicious...
This needs to be fixed asap...