bitextor / pdf-extract

PDF parser and converter to HTML
GNU General Public License v3.0
83 stars 14 forks source link

Deadlock if SentenceJoin writes to stderr #36

Closed kpu closed 4 years ago

kpu commented 4 years ago

If the sentence join process writes to stderr with more content than the pipe buffer can handle, it will deadlock because nobody is reading it.

https://github.com/bitextor/pdf-extract/blob/4ad28a23817851355ba65b6b4699a8f01b2cb760/src/pdfextract/SentenceJoin.java#L86

kpu commented 4 years ago

Not fixed.

Here's the first result on Google for "java subprocess stderr": https://stackoverflow.com/questions/39931485/java-process-read-stdout-and-stderr-of-a-subprocess-in-a-single-thread

The question asks "Is it possible to handle correctly stdout and stderr of a java.lang.Process-typed subprocess in Java without using additional threads or temporary files?" and the answer is "No".

In your case, the process is well within its rights to read a line of input, generate a large volume of stderr that fills the buffer, and your program will deadlock at https://github.com/bitextor/pdf-extract/blob/42b045c10e1b1555a42bd3f28800a88182bbf7cd/src/pdfextract/SentenceJoin.java#L163