Closed chapmanb closed 7 years ago
Hey @chapmanb - I think you're intuition is right that there's something strange going on here. A few thoughts:
fgbio -Xms500m -Xmx22g
, does that definitely get expanded to java -Xms500m -Xmx22g -jar fgbio.jar
? I'm guessing it must otherwise you'd get some other failure message, but I had to ask.I'd be really curious to see how the command line gets re-written and executed. The only way I can think that it would run out of memory in this weird way would be if perhaps somehow the command line gets re-written to cause the input BAM's contents to be read as a command line parameter, but that seems far fetched.
Any chance you could either:
java -Xmx4g -jar fgbio.jar GroupReadsByUmi -i in.bam -o grouped.bam -s paired
and see if you get the same problem?Tim; Thanks much for thoughts and directions, this really helped focus on what might be going on. The fgbio wrapper does correctly pass through options, and in the problem of simplifying I realized the issues is due to the length of the temporary directory specification. It is totally bizarre but if you have a tempdir setting like:
-Djava.io.tmpdir=/testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttestt/tmptest12
It'll run fine (well, until it discovers the temporary directory doesn't exist). However if you switch to:
-Djava.io.tmpdir=/testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttesttestt/tmptest123
then it explodes due to memory usage issues. I can replicate this on different machines with different BAM files and it doesn't appear to be due to input size since it failed with a 4Mb BAM input.
Are you able to reproduce this issue if you try running GroupReadsByUmi
with the second longer tmpdir that fails? If not I can put together a full example. Sorry this is so strange but hope it gives some clues about what might be going on.
I can reproduce from here. I think I see what the problem is. Should have a fix shortly.
Ok, I have a fix, but @nh13 is offline and I'd like him to review before I merge. The short version is it's nothing to do with the BAM, and is a bug in some unrelated code. We, as part of 0.2.0, have transitioned to writing all our usage and argument documentation in MarkDown, and now have a system to parse and emit website documentation, and also to parse and emit the plain text that comes out when we need to print usage to the console. The problem was is a bug in the conversion to plain text that couldn't handle word-wrapping correctly when a single word was longer than the maximum line length (oops!). When that happened it just kept adding more and more blank lines until it ran out of memory.
The other part to this is that, also in 0.2.0, we've introduced global options to fgbio. I don't know if you noticed but you can now write things like:
fgbio --tmp-dir=/tmp --compression=0 --async-io GroupReadsByUmi -i my.bam -o out.bam
And the default value for --tmp-dir
is the system property java.io.tmpdir
. So if you set that to something really long, and cause the usage to be printed, it triggered this bug.
I'm not sure why, but I think there must have been something in your command line(s) that was not quite right, causing it to try and print the usage.
The fix is a separate project: https://github.com/fulcrumgenomics/sopt/pull/7 . Once that PR is merged you should be able to simply rebuild fgbio and it should pull in the updated snapshot.
@chapmanb @tfenne I merged it into sopt, so depending on how long it takes to get to our snapshots repo, it should be as simple as rebuilding fgbio (run sbt update && sbt assembly
).
Tim and Nils; Thanks so much for looking at this and the quick fix. Do you know if the sopt snapshot has gotten updated (or is there a way I can monitor)? I tried re-building everything this morning but am still running into the same memory usage issues with long tmpdir specifications. I'm not sure if I'm missing it, there is still an issue, or if I'm just not patient enough. Thanks again.
It usually doesn't take very long to propagate. The snapshots for sopt are here and would indicate that it updated within an hour or two of Nils' comment.
Automatic dependency management and snapshots seem to run into trouble from time to time and it's a PITA to figure out. Probably the easiest solution is to checkout sopt
and run sbt publishLocal
and then in fgbio sbt clean assembly
. Nils' is also in the middle of making the 0.2.0 releases which should be available soon too if you continue to run into problems.
Tim -- thanks for the details, either I can't seem to get it to update or it's not fixing the issue but wasn't able to figure out a build with this. I could work around it by using the new global options instead of java specific ones so switched over to doing that to get over the immediate problem. I'm happy to check again once the 0.2.0 releases are available, and will update the conda packages to use those. Thanks again for all the help.
@chapmanb we have released a new version of fgbio. Would you mind trying the JAR attached to the release?
Nils and Tim -- thanks so much. I updated and tested this and the 0.2.0 release does resolve the issue. Thanks again for the help fixing this.
Tim and Nils; Thanks for all the recent work supporting duplex reads. I've been working on adding support for this into bcbio and updated to the latest development version to get the duplex fixes in place since 1.4. Everything runs cleanly on smaller test samples but on reasonably sized test sample (5Gb inputs) I'm hitting memory problems even when allocating lots of Java memory. I see this both with duplex and standard UMI consensus creation. For completeness, here is the command I'm running which pipes
GroupReadsByUmi
into consensus calling and fastq creation:The memory errors are not very helpful to isolate the issue as they all appear to come from command line parsing which doesn't seem right. I've included a few of them below in case they are more informative to you than me. In digging through the fgbio changes I've noticed the big update to improve BAM read/writes. Is there anything I'm doing incorrectly to interact with that?
Thanks for any suggestions.