the kenlm crashes because there is no check for the '*.forward.binlm' file, with this error:
java.lang.Exception: /home/lpla/kenlm/util/file.cc:76 in int util::OpenReadOrThrow(const char*) threw ErrnoException because `-1 == (ret = open(name, 00))'.
No such file or directory while opening /home/usr/models/toy-model.forward.binlm
at pdfextract.SentenceJoin.start(SentenceJoin.java:110)
at pdfextract.PDFExtract.sentenceJoin(PDFExtract.java:1706)
at pdfextract.PDFExtract.sentenceJoin(PDFExtract.java:1130)
at pdfextract.PDFExtract.Extract(PDFExtract.java:285)
at Main.main(Main.java:81)
Furthermore, if the kenlm path setting is also invalid (in "kenlm_path" as first example shown above), another error is thrown:
java.lang.Exception: Traceback (most recent call last):
File "/home/lpla/sentence-join/sentence-join.py", line 231, in <module>
kenlm_forward = KenLM([kenlm_query,"-b","-n",args.model + ".forward.binlm"])
File "/home/lpla/sentence-join/sentence-join.py", line 29, in __init__
self.proc = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/home/usr/kenlm/bin/query': '/home/usr/kenlm/bin/query'
at pdfextract.SentenceJoin.start(SentenceJoin.java:110)
at pdfextract.PDFExtract.sentenceJoin(PDFExtract.java:1706)
at pdfextract.PDFExtract.sentenceJoin(PDFExtract.java:1130)
at pdfextract.PDFExtract.Extract(PDFExtract.java:285)
at Main.main(Main.java:81)
so another check should be issued before calling kenlm tools.
Although, if "sentence_join" setting is invalid, none of these errors are shown, showing a Warning in the output file:
<warnings>
<warning>
<method>sentenceJoin</method>
<detail><![CDATA[No model for language: en]]></detail>
</warning>
</warnings>
This should be shown in any of the other cases with a more accurate detail of the reason why sentenceJoin is not running.
If you provide valid paths for "sentence_join" and "kenlm_path" using PDFExtract.json or arguments, like:
But an invalid one for the "sentencejoin_model", like:
(I don't have any 'usr' user in /home/)
the
kenlm
crashes because there is no check for the '*.forward.binlm' file, with this error:Furthermore, if the
kenlm
path setting is also invalid (in "kenlm_path" as first example shown above), another error is thrown:so another check should be issued before calling
kenlm
tools.Although, if "sentence_join" setting is invalid, none of these errors are shown, showing a Warning in the output file:
This should be shown in any of the other cases with a more accurate detail of the reason why sentenceJoin is not running.