JustinChu / ntsm

This tools counts the number of specific k-mers within sequence data. The counts can then be compare to other counts to determine to compute the probability that sample are of the same origin to discover incongruent samples or sample swaps.
MIT License
21 stars 1 forks source link

unable to parses error #4

Closed DuttaAnik closed 2 months ago

DuttaAnik commented 2 months ago

Hello, Linked to previous questions, I gave the tool a try with the vcf and genome file. First I got an error that my VCF has multiple alternate alleles. Then, I filtered out all the multiallelic sites and kept only biallelic SNPs. After running the following command: scripts/generateSites name=test ref=/media/Ref_genome.fasta vcf=/media/Biallelic.vcf

I got this error:

unable to parses: Ref_genome_Chr7_70140658|10|AT    0       Chr7    70140653        25      19M     *       0       0       TGATG                 TTCCATAGTGTTGT  *       XT:A:U  NM:i:1  X0:i:1  X1:i:0  XM:i:1  XO:i:0  XG:i:0  MD:Z:5C13
unable to parses:  Ref_genomeChr7_70140658|10|CG    0       Chr7    70140653        23      19M     *       0       0       TGATG                 CTCCATAGTGTTGT  *       XT:A:U  NM:i:0  X0:i:1  X1:i:1  XM:i:0  XO:i:0  XG:i:0  MD:Z:19 XA:Z:Chr4,+33077807,19M,1;
unable to parses:  Ref_genomeChr7_70140658|11|AT    0       Chr7    70140654        25      19M     *       0       0       GATGT 

Do you know how to resolve those two issues? Thanks.

JustinChu commented 2 months ago

Multi alleles are still not supported, unfortunately. but the parsing bug should bed be fixed now in d717b80779d2384d0037f9d27e9aa895d2e2c565 . Let me know if you have any other issues.

JustinChu commented 2 months ago

Resolved as confirmed in https://github.com/JustinChu/ntsm/issues/3