Closed johnc1231 closed 7 years ago
At the moment we haven't enabled standard output - most of that is historical in nature. Nirvana originally produced three output files - a VCF, a gVCF, and a JSON output file. Increasingly, however, the VCF and gVCF output files are becoming more and more obsolete in favor for the structured representation of the JSON files.
All current console output is sent to stdout (program banner, current progress, etc), so we would need to do something about that. We could move all that to stderr, but in general, we're not keen on sacrificing stderr for actual error reporting since some pipelines are monitoring stderr for those purposes.
I'll add this to our backlog and discuss it with the team during tomorrow's stand-up.
The reason I ask is I am trying to integrate Nirvana support into Hail, where we have to run Nirvana on a distributed dataset split across multiple partitions. Currently we have an implementation for VEP where we parallelize across partitions and run VEP using stdin and stdout. Our users won't be able to see the progress bars that Nirvana prints by default when they're running this on a cluster anyway, and we'd like to reuse some of the existing infrastructure for VEP to get Nirvana up and running. VEP has the same sort of progress bar output as you, but when you write to stdout it turns it off.
Also, using stdout is important for performance for us. The overhead of constantly writing and reading files is something we definitely don't want to do, so we'd appreciate if there was some way that this could be done using stdout.
I completely understand!
We did some initial digging into this today and it looks like something that we can fix rather quickly.
Hi John,
We now have preliminary support for both redirected stdin and redirected stdout!
For stdin, we actually support redirected uncompressed, gzip, and bgzipped VCF files. This was mostly done for completeness since I'm guessing that you normally don't want a bunch of compression/decompression occurring between your piped commands.
For both the input and output command-line arguments, the hyphen (-) denotes a request for stdin and stdout redirection respectively:
cat Data/Mother/chr2.vcf | \
dotnet /d/Projects/Nirvana/bin/Release/netcoreapp1.1/Nirvana.dll \
-r References/5/Homo_sapiens.GRCh37.Nirvana.dat -c Cache/24/GRCh37/Ensembl84 \
--sd SupplementaryDatabase/36/GRCh37 -i - -o - | wc -l
dotnet /d/Projects/Nirvana/bin/Release/netcoreapp1.1/Nirvana.dll \
-r References/5/Homo_sapiens.GRCh37.Nirvana.dat -c Cache/24/GRCh37/Ensembl84 \
--sd SupplementaryDatabase/36/GRCh37 -i - -o - < Data/Mother/chr2.vcf > chr2.json
stdout redirection causes the following to happen:
You can test out the new stdin/stdout functionality in the features/stdout branch.
Thanks a lot for the speedy response. This should be very helpful. I'll give it a try later today.
It's working great for me now. Thanks a lot.
Awesome! Feel free to add any other issues or feature requests here.
I can't seem to find anyway in Nirvana to write to standard out instead of a particular file. When I try to specify /dev/stdout as output file, it tries to write to "/dev/stdout.json.gz". That seems like unintended behavior.