ablab / stringdecomposer

Tool for decomposition centromeric assemblies and long reads into monomers
Other
32 stars 2 forks source link

Question about +/- in final_decomposition.tsv #19

Open 865699871 opened 2 years ago

865699871 commented 2 years ago

In the study of Altemose et al. (Complete genomic and epigenetic maps of human centromeres), CHM13 Cen1 contains 1.7Mb inversion inside active α HOR array (Fig 2a). We used Stringdecomposer in Cen1 active α HOR array. However, all items in final_decomposition.tsv are +. Can stringdecomposer mark + / - for sequence?

TanyaDvorkina commented 2 years ago

Hi,

Thank you for your interest in StringDecomposer! In our tsv-files +/- at the end of each row refer to "reliability" of alignment (see more info about output in Quick start section). This characteristic is needed for monomer-to-read alignment only.

The strand is represented as ' at the end of the monomer name. Consider two rows in final tsv-file: ref mon 1 171 99 ref mon' 172 343 99

Second row shows that monomer mon is aligned with identity 99 in reverse strand.

We understand that such representation of strand is a bit misleading and we are going to add bed-file representation of StringDecomposer output in the nearest release. For now you can use our internal script to convert StringDecomposer final tsv-file to bed-file convert2bed.py.

If this won't help, please don't hesitate to ask further questions!

Best, Tanya

865699871 commented 2 years ago

Thank you for your response!