Closed carlos88morais closed 7 years ago
Dear Carlos, we are aware of that issue. Can you provide some information about the assembly you are working on?
Il 1 mar 2017 23:02, "carlos88morais" notifications@github.com ha scritto:
Hi,
I am running medusa for scaffolding 15 genomes, sizes from 5Mb to 12Mb. It went fine for 14 of them, taking a few hours each, but there is one stuck in the "Building the network..." phase - it's been running for 555 hours by now, with low memory and cpu usage.
Is there any way to check progress, more detailed than the normal output?
Regards, Carlos
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/combogenomics/medusa/issues/9, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQhmR4kh0IO9GAChy8SiRZEBKl9bd0yks5rherggaJpZM4MQQxh .
Dear Emanuele,
The data is from Methylobaterium populi, strain TC3-6, sequenced with Illumina MiSeq. The assembly is a combination of the result from many assemblers, using Metassembler. The references are 10 of the best hits from an NCBI blast. The other 14 assemblies, also Methylobacterium, different strains, same process, finished in a few hours.
Regards, Carlos
Hi Carlos, I just need to know how many contigs, n50 etc.
Il 2 mar 2017 15:44, "carlos88morais" notifications@github.com ha scritto:
Dear Emanuele,
The data is from Methylobaterium populi, strain TC3-6, sequenced with Illumina MiSeq. The assembly is a combination of the result from many assemblers, using Metassembler https://sourceforge.net/projects/metassembler/. The references are 10 of the best hits from an NCBI blast. The other 14 assemblies, also Methylobacterium, different strains, same process, finished in a few hours.
Regards, Carlos
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/combogenomics/medusa/issues/9#issuecomment-283671933, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQhmd-5Ltirtk0s1g29RT0mK6I8MHikks5rhtXOgaJpZM4MQQxh .
# contigs 889
# contigs (>= 0 bp) 889
# contigs (>= 1000 bp) 888
# contigs (>= 5000 bp) 583
# contigs (>= 10000 bp) 350
# contigs (>= 25000 bp) 117
# contigs (>= 50000 bp) 35
Largest contig 236565
Total length 11696099
Total length (>= 0 bp) 11696099
Total length (>= 1000 bp) 11695439
Total length (>= 5000 bp) 10787615
Total length (>= 10000 bp) 9107237
Total length (>= 25000 bp) 5347029
Total length (>= 50000 bp) 2669255
N50 22810
N75 11212
L50 138
L75 319
GC (%) 70.010
Mismatches
# N's 1935
# N's per 100 kbp 16.54
Ok the problem we have is that the algorithm scales badly with the number of contigs. You might wanna either obtain a better assembly or filter out short contigs before using medusa... let me know if that works out.
Il 3 mar 2017 11:56, "carlos88morais" notifications@github.com ha scritto:
contigs 889 contigs (>= 0 bp) 889 contigs (>= 1000 bp) 888 contigs (>= 5000 bp) 583 contigs (>= 10000 bp) 350 contigs (>= 25000 bp) 117 contigs (>= 50000 bp) 35
Largest contig 236565 Total length 11696099 Total length (>= 0 bp) 11696099 Total length (>= 1000 bp) 11695439 Total length (>= 5000 bp) 10787615 Total length (>= 10000 bp) 9107237 Total length (>= 25000 bp) 5347029 Total length (>= 50000 bp) 2669255 N50 22810 N75 11212 L50 138 L75 319 GC (%) 70.010 Mismatches N's 1935 N's per 100 kbp 16.54
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/combogenomics/medusa/issues/9#issuecomment-283925552, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQhmS-0702xyGRx5yCjWr9LNSyuvkClks5rh_HbgaJpZM4MQQxh .
Dear Emanuele,
As a test, I just deleted the smaller contigs until my assembly had only 700 - because there are 2 other genomes with more than 750 contigs which ran ok. It is running since friday, also stuck in building the network. That makes me wander: Is it just the number of contigs which scales badly? Or maybe the genome size also have a considerable effect on processing time? Even with only 700 contigs, the assembly still have around 11M bases.
Best Regards, Carlos Morais
Dear Carlos, I can see it being a bit tedious but, at the moment, there's no easy workaround.
Is it just the number of contigs which scales badly? Or maybe the genome size also have a considerable effect on processing time?
More than genome size, it is network size. Genomes with a large number of contigs correspond to dense networks which take more time to be analysed. However, there is no strict correlation between these two variables, meaning that genomes with more than 750 contigs might run better than this one. In conclusion... persevere! I would keep this one running meanwhile, I would try to reduce even further the number of contigs and make another run. Another possible option would be to reduce your set of reference genomes. This might lead to less tangled networks, so it could be worth to give it a try. I hope I've been useful! If you need more advice don't hesitate to ask me. Good luck Emanuele
On Mon, Mar 6, 2017 at 3:27 PM, carlos88morais notifications@github.com wrote:
Dear Emanuele,
As a test, I just deleted the smaller contigs until my assembly had only 700 - because there are 2 other genomes with more than 750 contigs which ran ok. It is running since friday, also stuck in building the network. That makes me wander: Is it just the number of contigs which scales badly? Or maybe the genome size also have a considerable effect on processing time? Even with only 700 contigs, the assembly still have around 11M bases.
Best Regards, Carlos Morais
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/combogenomics/medusa/issues/9#issuecomment-284410446, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQhmbp1-aO4pfrDTlpD385LIwftZn5dks5rjBflgaJpZM4MQQxh .
That worked! Scaffolding of the original assembly (with 889 contigs), reducing the number of references by half, just took 10 minutes Thanks a lot.
Dear Carlos, I'm happy it did work. Just consider that the obtained result might be suboptimal with respect to a complete reference set. Best Emanuele
On Mon, Mar 6, 2017 at 4:32 PM, carlos88morais notifications@github.com wrote:
That worked! Scaffolding of the original assembly (with 889 contigs), reducing the number of references by half, just took 10 minutes Thanks a lot.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/combogenomics/medusa/issues/9#issuecomment-284430469, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQhmauBPfIMg6nhzOjRuwtVU5wL-dANks5rjCckgaJpZM4MQQxh .
Yes, I'm trying with different sets of references to figure out the best result within reasonable time. Thanks again.
Hi,
I am running medusa for scaffolding 15 genomes, sizes from 5Mb to 12Mb. It went fine for 14 of them, taking a few hours each, but there is one stuck in the "Building the network..." phase - it's been running for 555 hours by now, with low memory and cpu usage.
Is there any way to check progress, more detailed than the normal output?
Regards, Carlos