eblerjana / pangenie

Pangenome-based genome inference
MIT License
107 stars 10 forks source link

Should i hard-masked the assemblies firstly #30

Open JanMiao opened 1 year ago

JanMiao commented 1 year ago

Hi ! I used your pipeline pangenome-graph-from-assemblies to construct a VCF file, but I noticed that some of the variants have an unusually high number of alleles. see belows: 1 18300 . GACACACAACACACAC GACAGACAGACACACACAC,GACAGACAGACAGACAGACACAC,GACAGACAGACAGACAGACAGACACACACACACACACACAC,G,GACACACACACACACACACACAC,GACAGACAGACAGACACACACACACACAC 1 19171 . CAAAAAAAA CAA,CAAA,CAAAAAAAAAAAA,CAAAAAAAAAAAAAAAAAAAA,CAAAAAAAAAAAAAAAAAAAAA,CAAAAAAAAAAAAAAAAAAAAAA,CAAAAA,CAAAA,CAAAAAAAAAA,C,CA,CAAAAAAAAAAA I suspect that this may be due to repetitive sequences in my input assemblies, which were directly obtained from hifiasm assembly results without being masked.

eblerjana commented 1 year ago

I assume these are variants in repetitive sequences? I guess it's not unusual to see a lot of different alleles there