Illumina / Nirvana

The nimble & robust variant annotator
https://illumina.github.io/NirvanaDocumentation/
GNU General Public License v3.0
170 stars 44 forks source link

No way to send output to stdout? #2

Closed johnc1231 closed 7 years ago

johnc1231 commented 7 years ago

I can't seem to find anyway in Nirvana to write to standard out instead of a particular file. When I try to specify /dev/stdout as output file, it tries to write to "/dev/stdout.json.gz". That seems like unintended behavior.

MichaelStromberg commented 7 years ago

At the moment we haven't enabled standard output - most of that is historical in nature. Nirvana originally produced three output files - a VCF, a gVCF, and a JSON output file. Increasingly, however, the VCF and gVCF output files are becoming more and more obsolete in favor for the structured representation of the JSON files.

All current console output is sent to stdout (program banner, current progress, etc), so we would need to do something about that. We could move all that to stderr, but in general, we're not keen on sacrificing stderr for actual error reporting since some pipelines are monitoring stderr for those purposes.

I'll add this to our backlog and discuss it with the team during tomorrow's stand-up.

johnc1231 commented 7 years ago

The reason I ask is I am trying to integrate Nirvana support into Hail, where we have to run Nirvana on a distributed dataset split across multiple partitions. Currently we have an implementation for VEP where we parallelize across partitions and run VEP using stdin and stdout. Our users won't be able to see the progress bars that Nirvana prints by default when they're running this on a cluster anyway, and we'd like to reuse some of the existing infrastructure for VEP to get Nirvana up and running. VEP has the same sort of progress bar output as you, but when you write to stdout it turns it off.

Also, using stdout is important for performance for us. The overhead of constantly writing and reading files is something we definitely don't want to do, so we'd appreciate if there was some way that this could be done using stdout.

MichaelStromberg commented 7 years ago

I completely understand!

We did some initial digging into this today and it looks like something that we can fix rather quickly.

MichaelStromberg commented 7 years ago

Hi John,

We now have preliminary support for both redirected stdin and redirected stdout!

For stdin, we actually support redirected uncompressed, gzip, and bgzipped VCF files. This was mostly done for completeness since I'm guessing that you normally don't want a bunch of compression/decompression occurring between your piped commands.

For both the input and output command-line arguments, the hyphen (-) denotes a request for stdin and stdout redirection respectively:

cat Data/Mother/chr2.vcf | \
dotnet /d/Projects/Nirvana/bin/Release/netcoreapp1.1/Nirvana.dll \
-r References/5/Homo_sapiens.GRCh37.Nirvana.dat -c Cache/24/GRCh37/Ensembl84 \
--sd SupplementaryDatabase/36/GRCh37 -i - -o - | wc -l

dotnet /d/Projects/Nirvana/bin/Release/netcoreapp1.1/Nirvana.dll \
-r References/5/Homo_sapiens.GRCh37.Nirvana.dat -c Cache/24/GRCh37/Ensembl84 \
--sd SupplementaryDatabase/36/GRCh37 -i - -o - < Data/Mother/chr2.vcf > chr2.json

stdout redirection causes the following to happen:

You can test out the new stdin/stdout functionality in the features/stdout branch.

johnc1231 commented 7 years ago

Thanks a lot for the speedy response. This should be very helpful. I'll give it a try later today.

johnc1231 commented 7 years ago

It's working great for me now. Thanks a lot.

MichaelStromberg commented 7 years ago

Awesome! Feel free to add any other issues or feature requests here.