Illumina / manta

Structural variant and indel caller for mapped sequencing data
GNU General Public License v3.0
404 stars 154 forks source link

Check and fail if input reference is gzip compressed #182

Open KamilSJaron opened 5 years ago

KamilSJaron commented 5 years ago

Hello,

Thanks for making Manta!

I just run manta in a cluster environment and after quite a lot of computation (2.5h using 32 cores), it started to complain that it can not open gzi index. Well, I have forgotten to copy the index locally and it seems that there is a simple solution to that. However, it would be really nice, if I manta could fail right away with basic problems like missing input files. Just a small check at the very beginning of every run if all required files are in place.

PS: I don't need help, this is just feedback.

x-chen commented 5 years ago

Thanks for the feedback! Could you please clarify a bit which index Manta complained about?

KamilSJaron commented 5 years ago

The <ref>.fasta.gz.gzi index file generated bysamtools faidx.

The first error message thats appears is:

[2019-05-16T08:35:53.329142Z] [<my_computer>] [39949_1] [TaskManager] [ERROR] Failed to complete command task: 'makeLocusGraph_chromId_000_scaf000001_0000' launched from master workflow, error code: 1, command: 'EstimateSVLoci --output-file <...> --align-stats <...> --region scaf000001:1-821199 --min-candidate-sv-size 8 --min-edge-observations 3 --ref <ref.fasta.gz> --align-file <mapping.bam> --chrom-depth <...>'
...
...
scaf000002_0000] [E::fai_load3] Failed to load .gzi index: <...>/ref.fasta.gz.gzi

I replaced paths by <...> to make it more readable.

ctsa commented 5 years ago

There is no support for compressed references. The reference file needs to be a fasta file with an htslib fasta index file ('.fai' extension).

KamilSJaron commented 5 years ago

I am quite sure that I actually managed to get meaningful results from manta using compressed reference, therefore I believe that there is support for compressed references. (I am really not trying to be a moron here, but I really did manage to run manta with compressed references ^^)

The problem I was reporting was not about support for compressed references, but about reporting an error early on, when .gzi index is missing (fast fail).

ctsa commented 5 years ago

Okay, interesting... I didn't think that would work. Yes, it should have failed early if the '.fai' index file was not found:

https://github.com/Illumina/manta/blob/master/src/python/lib/mantaOptions.py#L162-L164

..I'm not sure how you would have gotten around that check?

KamilSJaron commented 5 years ago

Well, .fai index was there. It was only .gzi index that was missing. ¯_(ツ)_/¯

I was computing on cluster, therefore I have specify the set of input files for every computation that gets copied locally. I was not aware that compressed index sequences have .gzi index too and therefore I copied only .fai. After some time... (continue reading the orginal issue description)