I am using chromosome_scaffolder.sh (MaSuRCA 4.0.9) to orient one draft genome assembly against another referenece genome.
The process seems to work, however in the resulting fasta file, there are the chromosome names of the reference and the names from the query genome.
In addition some of the query genome sequence identifiers are duplicated or multiple times in there.
This is of course fine, since I guess these are the sequences that in part could not be placed on the reference.
But according to the manual with the nb option this should not happen. In addition, these duplicated sequences IDs have not the same sequence length and represent splits rather than original scaffolds. It would be very helpful to indicate which parts of the original sequence are represenetd by them, so please add this information to the duplicated and splitted sequences. This would help to debug the process of orienting against a reference.
Hi,
I am using
chromosome_scaffolder.sh
(MaSuRCA 4.0.9) to orient one draft genome assembly against another referenece genome.The process seems to work, however in the resulting fasta file, there are the chromosome names of the
reference
and the names from thequery genome
.In addition some of the
query genome
sequence identifiers are duplicated or multiple times in there.This is of course fine, since I guess these are the sequences that in part could not be placed on the
reference
.But according to the manual with the
nb
option this should not happen. In addition, these duplicated sequences IDs have not the same sequence length and represent splits rather than original scaffolds. It would be very helpful to indicate which parts of the original sequence are represenetd by them, so please add this information to the duplicated and splitted sequences. This would help to debug the process of orienting against areference
.Thank you in anticipation
Best regards
Kristian