Illumina / GTCtoVCF

Script to convert GTC/BPM files to VCF
Apache License 2.0
41 stars 30 forks source link

Limitations of --output-vcf-path and --log-file #15

Open freeseek opened 6 years ago

freeseek commented 6 years ago

There are two limitations inconsistencies that I believe would be important to resolve: 1) --output-vcf-path does not allow to output to stdout. I got the following error:

GTC converter - ERROR - Unable to write to output file /dev/stdout

It would be nice to be able to output to stdout as this could be piped into tools like bcftools to generate compressed binary files rather than generate really large uncompressed text files. Ideally, the default behaviour could be to output to stdout if --output-vcf-path is not used 2) --log-file generates a log file (more or less including what is output on stderr) but even if the option is active the program will generate all the log output to stderr, duplicating this output. It is great that by default the log goes to stderr, but if the --log-file option is used then stderr should not be used instead

Last, I think it would be beneficial if the check for whether the manifest file matches the manifest file used in the gtc files happened in the very beginning, so that the user could promptly fix that without having to wait a very long time.

KelleyRyanM commented 6 years ago

Would it be preferable to support compressed output via writing to standard output, or having the program handling piping through bgzip? One advantage of the latter strategy is that it would be possible for a single invocation of the program to process multiple input files, and avoid repeated reading and processing of some of the information in the manifest.

freeseek commented 6 years ago

Yes, I see your point. Not sure what to suggest. You probably don't want to implement bgzip within the code, and that actually isn't even my use case as I always use compressed binary VCF (a.k.a. .bcf) instead. I thought though that allowing /dev/stdout for single VCF conversion would be an easy fix though.

KelleyRyanM commented 6 years ago

It wouldn't be necessary to implement bgzip, just internally pipe to a bgzip process. Though to your point, output to standard out is trivial and implicitly supports multiple compression mechanisms. I can put this into a pull request.

KelleyRyanM commented 6 years ago

There is a candidate branch at https://github.com/Illumina/GTCtoVCF/tree/standard_output if you'd like to make any additional comments