Closed pirolen closed 1 year ago
I wonder if it's due to the system's OOM killer, were you running out of memory? (though that would imply there's a memory leak if all individual files do work).
I was trying to search in logs to track down the cause, but did not find a way to identify what happened. Grepping for kill
did not return anything on /var/log/dmesg or /var/log/kern.log or /var/log/syslog
I have Ubuntu 20, could you advise where to look? Thanks!
As far as I can see, there is no significant memory leak in FoLiA-txt But maybe there is some strange oddity in the file at hand. I don't know. It seems that the first file is processed OK, but the second isn't.
I assume there is NO problem when that file is processed on its own?
The files process fine, if I call the converter one by one. I experienced the same thing on other files too, when calling the converter on directories of large files -- there can be near 1 mln tokens per file....
Typically, after having converted the first file, the process is killed.
Well, I just ran tests on some fairly small files, and here seems to be some random effect which makes the run to fail, but not always. It is currently taking 23,6 Gb of memory, and I will kill it myself, but agreed there is something rotten. Needs some investigation
OK, it is some multithreading problem I guess. A deadlock occurs. FoLiA-txt seems to 'stall' when running om multiple threads You could try to use the -t1 or --threads=1 option. (which slows down of course)
Best is to upgrade to the newest GIT version, which tells on how many threads you actually run. Good luck
@pirolen the git master has a fix now, which hopefully fixes the deadlock
Closing, considering it to be fixed
Hi, on large files, the FoLiA-txt tool in the containerized foliautils gets killed. I get:
It is not a big problem since one can call the tool separately per file, but thought to let you know.
Maybe it is better to call the tool per file in a shell script in the container, I did not try that.