eblerjana / pangenie

Pangenome-based genome inference
MIT License
114 stars 10 forks source link

Truncated genotyping #48

Closed Sunhh closed 1 year ago

Sunhh commented 1 year ago

Hi Jana,

I ran PanGenie for genotyping on my own dataset, and it finished without errors or warnings. However, I found on all 11 chromosomes, the genotyping results stopped too early, around 11 Mb or 13 Mb while the the chromosome length should be close to 30 Mb. What is your recommendation to solve this problem?

Thank you!

Honghe

eblerjana commented 1 year ago

Hi Honghe,

the output VCF should contain exactly the same variants as the input VCF, just with variants added. So in your case, the number of variants is different? Do you have the log output of PanGenie? And can you share the command you used to run it?

Best, Jana

Sunhh commented 1 year ago

Hi Jana,

Thank you for your reply! My output has the same variant record number as the input, but after some position of each chromosome, all the following positions are called as missing (./.). I found this problem disappeared after I removed the ‘-u’ parameter. I didn’t see anything abnormal in the log.

Best regards, Honghe

On Thu, Sep 14, 2023 at 12:35 Jana Ebler @.***> wrote:

Hi Honghe,

the output VCF should contain exactly the same variants as the input VCF, just with variants added. So in your case, the number of variants is different? Do you have the log output of PanGenie? And can you share the command you used to run it?

Best, Jana

— Reply to this email directly, view it on GitHub https://github.com/eblerjana/pangenie/issues/48#issuecomment-1719784027, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUNRB3JIL5QOHYB4AVHDQDX2MW63ANCNFSM6AAAAAA4UGV6FE . You are receiving this because you authored the thread.Message ID: @.***>

eblerjana commented 1 year ago

if you use the -u parameter, PanGenie reports genotype „./.“ for all variants not covered by any unique kmers. So I guess in your case, there are no unique kmers for these variants.

Sunhh commented 1 year ago

Hi Jana,

Then that'll be really weird to me. For each of the 11 chromosomes I have, the genotypes after ~ 10-13 Mb are all "./.". Considering my chromosome length ranges 28-38 Mbp, I cannot believe the rest 20 Mbp have no unique kmers at all. Besides, subsetting the reads with only those mapped to the un-genotyped regions could produce genotypes for some region but again after some position, all the following genotypes were "./.". However, I feel without "-u" parameter, my results look good. Since I can do genotyping without "-u" well, I think my problem has been solved well. I'll close this issue.

Thank you!

Best regards, Hognhe

if you use the -u parameter, PanGenie reports genotype „./.“ for all variants not covered by any unique kmers. So I guess in your case, there are no unique kmers for these variants.