caleblareau / mgatk

mgatk: mitochondrial genome analysis toolkit
http://caleblareau.github.io/mgatk
MIT License
101 stars 27 forks source link

Missing output files when running mgatk with test data #76

Closed smzt closed 1 year ago

smzt commented 1 year ago

Hello, Running mgatk from the venv3 environment, from the tests directory and following the instructions in the README.md, I executed: $ mgatk call -i humanbam -o out -n glio Tue Jul 04 13:08:24 CEST 2023: mgatk v0.6.8 Tue Jul 04 13:08:24 CEST 2023: NOTE: the samples below either have 0 mtDNA reads at the specified chromosome or are mapped to an incorrectly specified reference mitochondrial genome Tue Jul 04 13:08:24 CEST 2023: Will remove samples from processing: REMOVED: MGH97-P8-H02.mito REMOVED: MGH60-P6-B01.mito REMOVED: MGH97-P8-H03.mito REMOVED: MGH60-P6-A11.mito ERROR: Could not import any samples from the user specification. ERROR: check flags, logs, and input configuration (including reference mitochondrial genome); QUITTING

I slightly modified the command to avoid the above errors and gzip errors: Select jobs to execute... gzip: out/final/glio.A.txt: No such file or directory gzip: out/final/glio.C.txt: No such file or directory gzip: out/final/glio.G.txt: No such file or directory gzip: out/final/glio.T.txt: No such file or directory gzip: out/final/glio.coverage.txt: No such file or directory

So the command looks like this: $ nohup mgatk call -i humanbam -g hg19 -o out -n glio -z -so &> glio.log &

The -so option avoids the gzip errors but still the pipeline with the test data does not provide the output files that are described in the Wiki of mgatk.

Here is the log file glio.log

I also ran other lines in the README.md file to test mgatk but the tool also failed: $ nohup mgatk bcall -i barcode/test_barcode.bam -n bc1 -o bc1d -bt CB -b barcode/test_barcodes.txt -z -so &> bc1.log &

Here is the log file for this command bc1.log

This is the contest of the final directory under obc1d. chrM_refAllele.txt bc1.T.txt.gz bc1.G.txt.gz bc1.C.txt.gz bc1.coverage.txt.gz bc1.A.txt.gz bc1.depthTable.txt bc1.rds bc1.signac.rds

Files .variant_stats.tsv.gz, .cell_heteroplasmic_df.tsv.gz and *.vmr_strand_plot.png are always missing in the final folder. I've assumed these files should also be located in the final output directory or any other directory under the output folder I indicated in the executed command but I might be wrong.

I also opened an issue in maegatk https://github.com/caleblareau/maegatk/issues/11 because I thought the problem was with that tool but I see the reported problem/error es exactly the same as with mgatk.

Any help would be much appreciated.

Best regards,

Sheila

caleblareau commented 1 year ago

Hi @smzt my apologies for the delay-- the issue is fortunately easy to fix. I should have noted in the README that you need to specify the reference genome for this:

mgatk call -i humanbam -o out -n glio -g hg19

the mgatk default is for rCRS which is 16569 base pairs; the glio data was aligned to hg18 (16571). If you run into issues where there are no reads for genotyping, it's typically a reference genome issue (either length of using MT vs chrM for the mitochondrial chromosome name convention).