caleblareau / mgatk

mgatk: mitochondrial genome analysis toolkit
http://caleblareau.github.io/mgatk
MIT License
102 stars 27 forks source link

mgatk tenx endless runtime #98

Open AdrianParrilla opened 2 months ago

AdrianParrilla commented 2 months ago

Hi, I was trying to process a bam file coming from a scATAC sequencing, but after waiting for hours the process seems to be stuck at the beginning (see image attached). The input bam file is 33 Gb and has around 500,000,000 reads in total (246,780 reads per cell) and 11% of the reads covering the mt DNA.

mgatk issue

The command I used was:

mgatk tenx -i /mnt/smb/TDA/scATAC_files/results/test_scARC_possorted_downsampled_bam.bam -n test_scARC -o test_scARC_mgatk -c 60 -bt CB -b /mnt/smb/TDA/scATAC_files/results/barcodes.tsv -g /mnt/smb/TDA/scATAC_files/masked_genome.fa --keep-temp-files

After 2 hours of running, the only output I got was empty folders:

mgatk issue 2

Why is the process stuck at the first step? Is the bam file too big?

Thanks in advance for your help,

caleblareau commented 2 months ago

The issue is the fasta you supply to mgatk should only be the mito genome (or use one of the built in ones).

On Aug 16, 2024, at 9:20 AM, AdrianParrilla @.***> wrote:



Assigned #98https://github.com/caleblareau/mgatk/issues/98 to @caleblareauhttps://github.com/caleblareau.

— Reply to this email directly, view it on GitHubhttps://github.com/caleblareau/mgatk/issues/98#event-13911558551, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD32FYIEVXGM6Q5P37BELX3ZRYDEJAVCNFSM6AAAAABMUE6BRGVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTHEYTCNJVHA2TKMI. You are receiving this because you were assigned.Message ID: @.***>

AdrianParrilla commented 2 months ago

Hi, thanks for your reply. I tried that but I got an error saying: "User specified mitochondrial genome does NOT match .bam file; correctly specify reference genome or .fasta file". I tried also to extract the mitochondrial genome from our masked_genome.fa and input it as custom with --mito-genome, but I get the same error. Does the sample bam file need to be only the mitocondrial chromosome?

caleblareau commented 1 month ago

The bam file can have more than just the mitochondrial chromosome contig.

The issue is probably that the text following the > should match exactly the chromosome name (e.g., >chrM).

What is the contig name in the bam file and does cat <your fasta file> | grep ">" yield?