NCBI-Hackathons / EndoVir

Discovery of Novel Endogenous Viruses
MIT License
6 stars 4 forks source link

Megahit splits extending contig into severap smaller contigs #7

Open janpb opened 6 years ago

janpb commented 6 years ago

megahit splits larger contigs when extending. I suspect it's because of the kmer approach. The current approach and logic to extend contigs is based on on overlaps.

Approaches:

DCGenomics commented 6 years ago

My view:

  1. Do not assemble each time, only at the end

  2. Try also with metaspades and abyss

On Nov 3, 2017 16:30, "janpb" notifications@github.com wrote:

megahit splits larger contigs when extending. I suspect it's because of the kmer approach. The current approach and logic to extend contigs is based on on overlaps.

Approaches:

-

Assembling only the flanking sequences and corresponding aligned reads from magicblast, not the whole contig. The extension would be the concatenation of the extended flanks to the contig, i.e. fake a short read.

Collect reads used in the assembly and assemble reads instead of keeping a contig. However, megahit stores reads in a binary format which requires to write a special reader.

rather convoluted approach: map all reads back to contig and keep those which mapped.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NCBI-Hackathons/EndoVir/issues/7, or mute the thread https://github.com/notifications/unsubscribe-auth/AFePtdOFrcjVVJX0spH4DU6n58Lr0qR2ks5sy3f_gaJpZM4QRpDS .