gf777 / mitoVGP

The Vertebrate Genomes Project Mitogenome Assembly Pipeline
Other
18 stars 5 forks source link

Problem with trimming with mummer #6

Closed DustinSokolowski closed 2 years ago

DustinSokolowski commented 2 years ago

Hey!

I hope that you're doing well and thank you for this excellent pipeline!

I'm coming across with an error comparable to issue #1 .

I think that mummer is failing in trimming but I don't see any issue in the log specifically. I've attached the full log, the log specific to the trimming step, and the most recent fasta file (the new.fasta) and delta file.

I also saw that Amanda (Astahlke) found that it worked somewhat randomly, so I converted -z to 5000 and ran it 6 times to 6 outputs to see if it could get pushed through. Unfortunately, it failed on the same step every time.

mito_tirm.zip

Here's the dotplot of the fasta file aligned to itself.

image

gf777 commented 2 years ago

Hi @DustinSokolowski , thanks. In such edge cases I suggest manually fixing the sequence

All the best,

Giulio

DustinSokolowski commented 2 years ago

Hey!

Thanks so much for the quick reply! I was wondering if I could get a bit more insight on what to do manually here, given that mummer doesn't detect any overlaps. Would you suggest using circulator like the other option in the manuscript or the circulization_check in the mitofinder package (also suggested in #1 ). ? Otherwise would the sequence be good as is and I can just polish manually moving forward or are there other things I should trim?

Thanks so much! Dustin

gf777 commented 2 years ago

I think you should check the output of Canu. I suspect the mitocontig tig00000001 wasnt a concatamer to start with. You could then try play with -p (e.g. -p 70) to see if this produces better overlaps and therefore overhangs in the mitosequence that is carried over to the next steps

DustinSokolowski commented 2 years ago

Excellent, thanks so much! I'll give it a shot and update you!

Best, Dustin

DustinSokolowski commented 2 years ago

Hey!

I changed the -p so that -p 70 and it looks like nothing has changed. I also went through the canu log files and I think you're right that it's not a concatemer given canu's verdict on the contig is

-- contigs: 1 sequences, total length 32764 bp (including 0 repeats of total length 0 bp)

So given that there isn't anything repetitive and mummer can't detect anything, what would I actually have to manually trim away? Can I skip the trimming step altogether and just use the second round of polishing based on the "hGlab.tig00000001_polish2_10x1.fasta" file in assembly_MT_rockefeller/intermediates/freebayes_round1 ? I'm sorry if I'm missing anything obvious but at this stage I feel like I'd end up trimming randomly given that I can't find what should be trimmed.

Best! Dustin mito_trim_p70.zip .

gf777 commented 2 years ago

Hi @DustinSokolowski I think I can see it now. So the first round of trimming is failing to leave the short overhangs for some reason, but it's trimming the sequence correctly otherwise. So just add a little (like 20 bp) overhang (copy paste the sequence at the end, maybe with 1-2 bp difference as mummer has problems is the sequences are identical), run the map10x2 trimmer2 and linearize scripts (or restart the pipeline) and you should be good

DustinSokolowski commented 2 years ago

Hey!

So just to be 100% on the same page, are you talking about adding the overhang to the hGlab.tig00000001_polish2_10x1_new.fasta sequence or the intermediate.fasta sequence? I'm fairly sure you mean the _new.fasta file as it's half the size and should thus be what's correctly trimmed.

basically addoverhang.sh hGlab.tig00000001_polish2_10x1_new.fasta > hGlab.tig00000001_polish2_10x1_trim1.fasta

Should do the trick and then just re-run and we should be good?

Best, Dustin

gf777 commented 2 years ago

Yes I think that would work

On Tue, 8 Feb 2022 at 12:20, Dustin Sokolowski @.***> wrote:

Caution: External email

Hey!

So just to be 100% on the same page, are you talking about adding the overhang to the hGlab.tig00000001_polish2_10x1_new.fasta sequence or the intermediate.fasta sequence? I'm fairly sure you mean the _new.fasta file as it's half the size and should thus be what's correctly trimmed.

basically addoverhang.sh hGlab.tig00000001_polish2_10x1_new.fasta > hGlab.tig00000001_polish2_10x1_trim1.fasta

Should do the trick and then just re-run and we should be good?

Best, Dustin

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: @.***>

-- Giulio Formenti, Ph.D. Bioinformatics Lead, The Vertebrate Genome Laboratory Postdoctoral Associate, Jarvis Lab, Laboratory of Neurogenetics of Language The Rockefeller University/HHMI 1230 York Ave | Box 54 New York | New York | 10065 o: 212.327.8206 f. 212.327.8106

Confidentiality Notice: This message, together with its annexes, contains information to be deemed strictly confidential and is destined only to the addressee(s) identified above. If anyone received this message by mistake or reads it without entitlement is forewarned that keeping, copying, disseminating or distributing this message to persons other than the addressee (s) is strictly forbidden and is asked to transmit it immediately to the sender and to erase the original message received.

DustinSokolowski commented 2 years ago

IT WORKED! Thank you so much for your help across like 3 different posts. I've got a nice looking MtGenome and I'm very happy with it. Looking forward to putting it to use.

Also I'm sure that there are loads of solutions but a I used a basic R one for anyone in the future who comes across this thread (see attached) finalTrim.txt

Best! Dustin

gf777 commented 2 years ago

Very glad to hear it @DustinSokolowski and thanks for sharing your solution!