alekseyzimin / masurca

GNU General Public License v3.0
243 stars 35 forks source link

Chromosome scaffolding #303

Closed kullrich closed 1 year ago

kullrich commented 1 year ago

Hi,

I am using chromosome_scaffolder.sh (MaSuRCA 4.0.9) to orient one draft genome assembly against another referenece genome.

The process seems to work, however in the resulting fasta file, there are the chromosome names of the reference and the names from the query genome.

In addition some of the query genome sequence identifiers are duplicated or multiple times in there.

This is of course fine, since I guess these are the sequences that in part could not be placed on the reference.

But according to the manual with the nb option this should not happen. In addition, these duplicated sequences IDs have not the same sequence length and represent splits rather than original scaffolds. It would be very helpful to indicate which parts of the original sequence are represenetd by them, so please add this information to the duplicated and splitted sequences. This would help to debug the process of orienting against a reference.

Thank you in anticipation

Best regards

Kristian

kullrich commented 1 year ago

This is fixed in version 4.1.0