LauritsSkov / Introgression-detection

MIT License
36 stars 12 forks source link

vcf file of chr2 but hava chr1 site #8

Closed jackzhong1995 closed 2 months ago

jackzhong1995 commented 3 months ago

Hi! Thanks for the helpful archaic vcffiles. I find that the hg38 file individuals_highcov.2.bcf contain the site from chr1 like this : image

Not only the chr2 file have this problem, but also the chr1 file contain other contigs' sites. image I just want to know why? Does it will influence the result?

Best wishes. Jie

LauritsSkov commented 2 months ago

Dear Jie

Thanks for pointing this out to me! This was an error during the liftover of archaic genomes from hg19 to hg38. I have updated all the bcf files in the zenodo repository so now the issue should be solved! The new zenodo reposotory is "https://zenodo.org/records/13368126".

I dont think it will matter much for your analysis as less than 0.01% of sites were lifted over to the wrong chromosome!

jackzhong1995 commented 2 months ago

Hi Skov! Thanks for your rapidly reply. Here I got some confusion about using hmmix to analysis AFR samples, as you can see: After decode, I got the result (my data is unphased), the first fragment is too large (~16 Mb, although the mean_prob more than 0.9) , and I detected in total 1079.6 Mb fragment from the sample NA20412 which come from Africa. Obviously, this is quite unusual. Additionally, the fragment lengths I detected in samples from other continents are within the normal range (~80 Mb in total from one sample). So, how do you think about this unusual problem. Should there be a threshold for length of each fragments, such as 500 Kb, 1 Mb or 1.5 Mb (I noticed that "SI Figure 2.6.1 Length distribution of all fragments" in your Nature paper)? Best wishes, Jie.

发件人:LauritsSkov @.> 发送时间:2024年8月24日(星期六) 06:09 @.> @.>; @.> 主 题:Re: [LauritsSkov/Introgression-detection] vcf file of chr2 but hava chr1 site (Issue #8) Dear Jie Thanks for pointing this out to me! This was an error during the liftover of archaic genomes from hg19 to hg38. I have updated all the bcf files in the zenodo repository so now the issue should be solved! The new zenodo reposotory is "https://zenodo.org/records/13368126 <https://zenodo.org/records/13368126 >". I dont think it will matter much for your analysis as less than 0.01% of sites were lifted over to the wrong chromosome! — Reply to this email directly, view it on GitHub <https://github.com/LauritsSkov/Introgression-detection/issues/8#issuecomment-2307870129 >, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AYHIA2FLJZQQFNONUJC5RJLZS6XJXAVCNFSM6AAAAABLPWBKT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBXHA3TAMJSHE >. You are receiving this because you authored the thread.Message ID: @.***>

LauritsSkov commented 2 months ago

Hi Jie

You are using hmmix to look for archaic introgression in an African sample? So you are removing all SNPs in this individual which are found in other African genomes? What are your trained parameters? It could be that the two states are actually not corresponding to a Neanderthal and a human state but rather the model is overfitting and just splitting the human state into two states. If you want to find archaic introgressed segments in Africans I think a software like IBD-mix would be better suited for this!

Best Laurits

Den tirs. 27. aug. 2024 kl. 01.47 skrev jackzhong1995 < @.***>:

Hi Skov! Thanks for your rapidly reply. Here I got some confusion about using hmmix to analysis AFR samples, as you can see: After decode, I got the result (my data is unphased), the first fragment is too large (~16 Mb, although the mean_prob more than 0.9) , and I detected in total 1079.6 Mb fragment from the sample NA20412 which come from Africa. Obviously, this is quite unusual. Additionally, the fragment lengths I detected in samples from other continents are within the normal range (~80 Mb in total from one sample). So, how do you think about this unusual problem. Should there be a threshold for length of each fragments, such as 500 Kb, 1 Mb or 1.5 Mb (I noticed that "SI Figure 2.6.1 Length distribution of all fragments" in your Nature paper)? Best wishes, Jie.

发件人:LauritsSkov @.> 发送时间:2024年8月24日(星期六) 06:09 @.> @.>; @.> 主 题:Re: [LauritsSkov/Introgression-detection] vcf file of chr2 but hava chr1 site (Issue #8) Dear Jie Thanks for pointing this out to me! This was an error during the liftover of archaic genomes from hg19 to hg38. I have updated all the bcf files in the zenodo repository so now the issue should be solved! The new zenodo reposotory is "https://zenodo.org/records/13368126 < https://zenodo.org/records/13368126 >". I dont think it will matter much for your analysis as less than 0.01% of sites were lifted over to the wrong chromosome! — Reply to this email directly, view it on GitHub < https://github.com/LauritsSkov/Introgression-detection/issues/8#issuecomment-2307870129

, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AYHIA2FLJZQQFNONUJC5RJLZS6XJXAVCNFSM6AAAAABLPWBKT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBXHA3TAMJSHE . You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/LauritsSkov/Introgression-detection/issues/8#issuecomment-2311928851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHHKZGRAQGMK6YUFM4NYWL3ZTQ4KVAVCNFSM6AAAAABLPWBKT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJRHEZDQOBVGE . You are receiving this because you modified the open/close state.Message ID: @.***>

jackzhong1995 commented 2 months ago

Hello Skov Thanks for your valuable suggestions. Here I have another two questions:

  1. How do you deal with the long fragments? I found that in your supplementary dataset 1 , the longest one is ~1.8 Mb. However, in myself dataset (Not African), there are several fragments that are notably larger (> 2 Mb). Do you think these long fragments are false-positive? (All mean_prob > 0.9 in my dataset)
  2. How to identify a fragment to be mosaic or nonmosaic? My data is unphased, can I distinguish them? Best Jie

    发件人:LauritsSkov @.> 发送时间:2024年8月28日(星期三) 04:35 @.> @.>; @.> 主 题:Re: [LauritsSkov/Introgression-detection] vcf file of chr2 but hava chr1 site (Issue #8) Hi Jie You are using hmmix to look for archaic introgression in an African sample? So you are removing all SNPs in this individual which are found in other African genomes? What are your trained parameters? It could be that the two states are actually not corresponding to a Neanderthal and a human state but rather the model is overfitting and just splitting the human state into two states. If you want to find archaic introgressed segments in Africans I think a software like IBD-mix would be better suited for this! Best Laurits Den tirs. 27. aug. 2024 kl. 01.47 skrev jackzhong1995 < @.***>:

    Hi Skov! Thanks for your rapidly reply. Here I got some confusion about using hmmix to analysis AFR samples, as you can see: After decode, I got the result (my data is unphased), the first fragment is too large (~16 Mb, although the mean_prob more than 0.9) , and I detected in total 1079.6 Mb fragment from the sample NA20412 which come from Africa. Obviously, this is quite unusual. Additionally, the fragment lengths I detected in samples from other continents are within the normal range (~80 Mb in total from one sample). So, how do you think about this unusual problem. Should there be a threshold for length of each fragments, such as 500 Kb, 1 Mb or 1.5 Mb (I noticed that "SI Figure 2.6.1 Length distribution of all fragments" in your Nature paper)? Best wishes, Jie.


    发件人:LauritsSkov @.> 发送时间:2024年8月24日(星期六) 06:09 @.> @.>; @.> 主 题:Re: [LauritsSkov/Introgression-detection] vcf file of chr2 but hava chr1 site (Issue #8) Dear Jie Thanks for pointing this out to me! This was an error during the liftover of archaic genomes from hg19 to hg38. I have updated all the bcf files in the zenodo repository so now the issue should be solved! The new zenodo reposotory is "https://zenodo.org/records/13368126 <https://zenodo.org/records/13368126 > < https://zenodo.org/records/13368126 <https://zenodo.org/records/13368126 > >". I dont think it will matter much for your analysis as less than 0.01% of sites were lifted over to the wrong chromosome! — Reply to this email directly, view it on GitHub < https://github.com/LauritsSkov/Introgression-detection/issues/8#issuecomment-2307870129 <https://github.com/LauritsSkov/Introgression-detection/issues/8#issuecomment-2307870129 >

    , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AYHIA2FLJZQQFNONUJC5RJLZS6XJXAVCNFSM6AAAAABLPWBKT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBXHA3TAMJSHE <https://github.com/notifications/unsubscribe-auth/AYHIA2FLJZQQFNONUJC5RJLZS6XJXAVCNFSM6AAAAABLPWBKT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBXHA3TAMJSHE > . You are receiving this because you authored the thread.Message ID: @.***>

    — Reply to this email directly, view it on GitHub https://github.com/LauritsSkov/Introgression-detection/issues/8#issuecomment-2311928851 https://github.com/LauritsSkov/Introgression-detection/issues/8#issuecomment-2311928851 >, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHHKZGRAQGMK6YUFM4NYWL3ZTQ4KVAVCNFSM6AAAAABLPWBKT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJRHEZDQOBVGE https://github.com/notifications/unsubscribe-auth/AHHKZGRAQGMK6YUFM4NYWL3ZTQ4KVAVCNFSM6AAAAABLPWBKT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJRHEZDQOBVGE > . You are receiving this because you modified the open/close state.Message ID: @.***>

    — Reply to this email directly, view it on GitHub <https://github.com/LauritsSkov/Introgression-detection/issues/8#issuecomment-2313479383 >, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AYHIA2CVQ4BUKZPQOCSXPTTZTTPKTAVCNFSM6AAAAABLPWBKT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJTGQ3TSMZYGM >. You are receiving this because you authored the thread.Message ID: @.***>