combogenomics / medusa

A draft genome scaffolder that uses multiple reference genomes in a graph-based approach.
http://combo.dbe.unifi.it/medusa/
GNU General Public License v3.0
42 stars 15 forks source link

Controlling run parameters #12

Closed anandksrao closed 7 years ago

anandksrao commented 7 years ago

Greetings!

I have some questions about MeDuSa runs:

1. Is it necessary or even possible to modify run parameters in the underlying MUMmer software from the medusa command line prompt syntax? Specifically a discussion thread here talks about how considering only unique seeds, increase the minimum match length, and increase the minimum cluster length can reduce run times. OR is this only possible via manipulation of the underlying MUMmer parameter file - in which case, which one?

2. I am thinking of shaving off short contigs drastically (filter out shorter than N50, for example) and then used 5 of these filtered drafts with 50 - 100 scaffolds each, to assemble my target genome. Do you see any drawbacks using these sizefiltered reference drafts?

3. If the target and reference (pairwise comparison) have converged after a few rounds, then does the software return a message as below? "Building the network...SORRY: No information found. Are you sure to have MUMmer packedge location in your PATH? If yes, the chosen drafts genomes don't provide sufficient information for scaffolding the target genome."

EBosi commented 7 years ago

Hi! Point by point: 1) unfortunately not. Feel free to edit mummerRunner.sh to change the mummer call, or your installed mummer to have something more exotic.

2) I would say go for it! I'm usually trashing everything below 2kbp, but feel free to do whatever you think works best.

3) That message means that the target(s) do not provide information to build a scaffolding graph... unless your mummer doesnt work (which should not be your case)

Hope that's useful! Happy scaffolding Emanuele

On Mon, Apr 3, 2017 at 8:33 PM, anandksrao notifications@github.com wrote:

Greetings!

I have some questions about MeDuSa runs:

1. Is it necessary or even possible to modify run parameters in the underlying MUMmer software from the medusa command line prompt syntax? Specifically a discussion thread here https://sourceforge.net/p/mummer/mailman/message/32670327/ talks about how considering only unique seeds, increase the minimum match length, and increase the minimum cluster length can reduce run times. OR is this only possible via manipulation of the underlying MUMmer parameter file - in which case, which one?

2. I am thinking of shaving off short contigs drastically (filter out shorter than N50, for example) and then used 5 of these filtered drafts with 50 - 100 scaffolds each, to assemble my target genome. Do you see any drawbacks due to the size-based filtering?

3. If the target and reference (pairwise comparison) have converged after a few rounds, then does the software return a message as below? "Building the network...SORRY: No information found. Are you sure to have MUMmer packedge location in your PATH? If yes, the chosen drafts genomes don't provide sufficient information for scaffolding the target genome."

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/combogenomics/medusa/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQhmZWIHjGNWc3HZ6_dizHs1jSakR3gks5rsUmhgaJpZM4Mx_cM .