Open 865699871 opened 2 years ago
Hi,
Thank you for your interest in StringDecomposer! In our tsv-files +/- at the end of each row refer to "reliability" of alignment (see more info about output in Quick start section). This characteristic is needed for monomer-to-read alignment only.
The strand is represented as ' at the end of the monomer name. Consider two rows in final tsv-file: ref mon 1 171 99 ref mon' 172 343 99
Second row shows that monomer mon is aligned with identity 99 in reverse strand.
We understand that such representation of strand is a bit misleading and we are going to add bed-file representation of StringDecomposer output in the nearest release. For now you can use our internal script to convert StringDecomposer final tsv-file to bed-file convert2bed.py.
If this won't help, please don't hesitate to ask further questions!
Best, Tanya
Thank you for your response!
In the study of Altemose et al. (Complete genomic and epigenetic maps of human centromeres), CHM13 Cen1 contains 1.7Mb inversion inside active α HOR array (Fig 2a). We used Stringdecomposer in Cen1 active α HOR array. However, all items in final_decomposition.tsv are +. Can stringdecomposer mark + / - for sequence?