RemiAllio / MitoFinder

MitoFinder: efficient automated large-scale extraction of mitogenomic data from high throughput sequencing data
86 stars 14 forks source link

run time for assembly step very long #13

Closed MareikeJaniak closed 3 years ago

MareikeJaniak commented 3 years ago

Hi!

Thanks for putting this resource together! I'm hoping that you can help me with some issues I'm running into.

I'm trying to use MitoFinder to pull mitochondrial genomes from whole genome shotgun data of non-human primates. MitoFinder is running, but the assembly step is taking a very long time with megahit (> 18 hours with 8 threads and 64 GB of memory) and with metaspades it failed with the error message: "The reads contain too many k-mers to fit into available memory. You need approx. 166.78GB of free RAM to assemble your dataset."

This is much longer and much more memory than is suggested to be necessary in your paper. I'm using paired-end data with ~ 200 million reads total (~100 million per direction). Do I need to downsample my input data or is something else going on? If downsampling is the answer, how many reads would you recommend as input?

RemiAllio commented 3 years ago

Hi there!

Thank you for your comment.

Well, in fact 18 hours is quite long and MitoFinder should be able to do the same job in less time. To do so, I really encourage you to downsample your input data. We are used to assembling the mitochondrial genome with 10M reads of shotgun sequencing. So I would advise you to try with 5-10M reads, and if the mitogenome is fragmented, try with a few more reads.

Just in case, I just want to point out that, if you already have an assembly, you can try to find the mitogenome it contains by using the -a option of MitoFinder. Depending on the assembler used, the mitogenome can be found there but also be removed by the assembler...

Finally, I am working on a new version of MitoFinder (v1.4) in which the tRNA annotation step is really better. I should release this version today. So if you are not in a hurry, I recommend you to try the new version.

Cheers, Rémi

MareikeJaniak commented 3 years ago

Hi Rémi,

Thanks so much for your quick reply! I was hoping that downsampling would be the answer, so that's great! Our data are just short reads and not assembled at the moment, but good to know about the -a option.

I will try it with ~10 M reads and I'm not in a rush, so I'll wait for the new version.

Best, Mareike

RemiAllio commented 3 years ago

Hi Mareike,

The new version of MitoFinder (1.4) is now available: git clone https://github.com/RemiAllio/MitoFinder.git

I hope this will do a good job in your case! Because you will certainly be the first user for this version, please let me know if it works for you and if you see something I should improve.

Thank you, Cheers, Rémi

fdelsuc commented 3 years ago

Hi Mareike,

Nice to see that you are using MitoFinder. I hope you find it useful. As Rémi said, the newly released v1.4 should provide a great improvement on tRNA annotation thanks to the implementation of MitFI, which is now the default program. Do not hesitate to send us feedback.

Best wishes,

Frederic

MareikeJaniak commented 3 years ago

Hi Rémi, hi Frederic,

I've subsampled my reads and will install v1.4 now. I'll keep you posted on how it goes!

Thanks for your help and quick responses!

Best, Mareike

MareikeJaniak commented 3 years ago

Hi Rémi, hi Frederic,

Just an update that subsampling has sped up the assembly process quite a bit.

I subsampled to 3500000 reads per file (7 million total) and assembly with megahit was extremely fast. Metaspades took a bit longer, but only around 4 hours total walltime for the whole pipeline. I annotated with tRNAscan and it worked great!

Best, Mareike

RemiAllio commented 3 years ago

Hi Mareike,

Thank you for the update! Really glad that the assembly was as fast as you expected.

Now you have the assembly, you can use the different tRNA annotation tools without the need to rerun the assembly by running exactly the same command with only changing the --tRNA-annotation option. Of course, in your case, we still have to understand what happened with MiTFi. Indeed, in our hands, we use to have better results with MiTFi.

Best, Rémi

MareikeJaniak commented 3 years ago

Hi Rémi,

Yes, it's great that I can rerun the annotation without having to reassemble. I hope to do that once we figure out what's going on with MitFi.

Best, Mareike