Open zoey-rw opened 6 days ago
What is the return code of the program in those cases? If it is different from zero the easiest would be to check this in your script. That should also tell you what killed the process.
architeuthis
uses very little RAM (<100M for filter) but some schedulers are not good in distinguishing cached files from memory usage and that can create issues with buffered IO. The logs would also tell you whether the file is complete. They always end in a line like
2024/09/26 15:30:56 Processed 9614861 reads - Done. 9129184/9614861 reads passed the filter.
P.S. thanks for all your efforts to write/maintain scientific software.
You're welcome! Glad the tool is useful.
I noticed that some of the "filter" runs for very large (20+ GB) files will be killed, but still produce output files that (seemingly?) work fine when passed Bracken. I don't think it's a memory limitation of the environment, because my Bash loop will continue and the filter command will execute successfully on another large file. Other than the std out message "Killed", the only clue was a warning if the 2+ GB filtered output file is read into R:
The filter command:
The command line output:
To find the incomplete files, I was able to borrow this bash function to print the any files without a newline:
In my case, this returned 9 files out of about 1400, so it is likely an edge case. A couple of the output files from the architeuthis "score" command were also returned. Not sure what the ideal behavior would be here (maybe adding a "complete" flag after writing? or maybe Bracken should be catching this when reading in files?). It seems like I can just re-run the problem samples, but wanted to flag it!
P.S. thanks for all your efforts to write/maintain scientific software. I am your number 1 fan.