caleblareau / mgatk

mgatk: mitochondrial genome analysis toolkit
http://caleblareau.github.io/mgatk
MIT License
101 stars 27 forks source link

Missing files in output final folder: *.variant_stats.tsv.gz *.cell_heteroplasmic_df.tsv.gz *.vmr_strand_plot.png #53

Closed nmalwinka closed 2 years ago

nmalwinka commented 2 years ago

Describe the bug

I have tried to use mgatk on your test humanbam dataset and on my own dataset (mouse) and in both runs I don't get these files in the final output folder:

.variant_stats.tsv.gz .cell_heteroplasmic_df.tsv.gz *.vmr_strand_plot.png

I get all the other files mentioned in your wiki.

For humanbam I have run the command: mgatk call -i humanbam/ -o humanbam/ --jobs 1 -c 1 -g hg19_chrM

A summary of .log files

Thu Jan 06 14:09:03 GMT 2022: Starting analysis with mgatk Thu Jan 06 14:09:03 GMT 2022: mgatk will process 4 samples Thu Jan 06 14:09:03 GMT 2022: Processing samples with 1 threads Thu Jan 06 14:09:22 GMT 2022: mgatk successfully processed the supplied .bam files Thu Jan 06 14:09:28 GMT 2022: Successfully created final output files Thu Jan 06 14:09:28 GMT 2022: Intermediate files successfully removed.

Parameters: input_directory: 'humanbam/' output_directory: 'humanbam/' script_dir: xxx fasta_file: 'humanbam//fasta/chrM.fasta' mito_chr: 'chrM' mito_length: '16571' name: 'mgatk' base_qual: '0' remove_duplicates: 'True' handle_overlap: 'False' low_coverage_threshold: '10' barcode_tag: 'X' umi_barcode: '' alignment_quality: '0' emit_base_qualities: 'False' proper_paired: 'False' NHmax: '1' NMmax: '4' max_javamem: '8000m'

Describe the sequencing assay being analyzed

my dataset is scRNAseq, so my expectation was that it may not work, but I expected to see all final output files for the test dataset at least.

Clarify if the execution successful on the test data provided in the repository

As mentioned above, i used your humanbam dataset but several output files were missing.

I would be grateful for your help. Many thanks.

caleblareau commented 2 years ago

Right, so only in tenx mode are those files created as there are specific assumptions about how the variant calls are made; the files that do show up indicate that the run was successful-- now you should load them into R / Signac / python

caleblareau commented 2 years ago

with only 4 samples, the strand correlation and VMR aren't stable-- you may be better off using something like freebayes for so few samples