alekseyzimin / masurca

GNU General Public License v3.0
245 stars 35 forks source link

Confusion about configuration file #18

Closed KevinMenden closed 6 years ago

KevinMenden commented 6 years ago

Hi Aleksey,

thanks for the great program!

I've been testing MaSurCA with a few different genomes because we are evaluating which assembler will perform best for out data ( ~8-10x Nanopore, ~20-40x Illumina, 2-3GB genome size).

I'm a bit confused about what is your recommended best configuration file, because in your README on GitHub you seem to have two configuration files as examples, however with varying recommendations. For instance:

File 1:

set this to 1 for all Illumina-only assemblies

set this to 1 if you have less than 20x long reads (454, Sanger, Pacbio) and less than 50x CLONE coverage by Illumina, Sanger or 454 mate pairs

otherwise keep at 0

USE_LINKING_MATES = 0

File 2 • USE_LINKING_MATES=1

most of the paired end reads end up in the same super read and thus are not passed to the assembler. Those that do not end up in the same super read are called ”linking mates” . The best assembly results are achieved by setting this parameter to 1 for Illumina-only assemblies. If you have more than 2x coverage by long (454, Sanger, etc) reads, set this to 0.


Now our data falls in-between 2x and 20x long-read coverage, so you understand my confusion. Could you maybe edit the README so that it is less confusing? Thank you!

alekseyzimin commented 6 years ago

you can try both settings, and see which one gives the best assembly, using linking mates make assembly run slightly longer

KevinMenden commented 6 years ago

Alright, thanks!