evotools / hapbin

Efficient program for calculating Extended Haplotype Homozygosity (EHH) and Integrated Haplotype Score (iHS)
GNU General Public License v3.0
41 stars 18 forks source link

How to get a map file #56

Closed SC-Duan closed 5 years ago

SC-Duan commented 5 years ago

Hi, I have a genetic map including about 300 loci, and this map has been used to phase my vcf files using shapeit. Now I want to convert '.legend' file to map file using make_map.py, but the result is wrong, it just includs same lines with 300 loci. Is there a method to get map file to feed into ehhbin? Thank you!

prenderj commented 5 years ago

Hi

The map file you used for shapeit should work fine as it is the same format as required by hapbin i.e. four columns (chromosome, locus ID, genetic position and physical position). You can see an example file in the data folder.

Cheers James

On Tue, 16 Apr 2019 at 08:46, dzaccook notifications@github.com wrote:

Hi, I have a genetic map including about 300 loci, and this map has been used to phase my vcf files using shapeit. Now I want to convert '.legend' file to map file using make_map.py, but the result is wrong, it just includs same lines with 300 loci. Is there a method to get map file to feed into ehhbin? Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/evotools/hapbin/issues/56, or mute the thread https://github.com/notifications/unsubscribe-auth/AHrsGpNxrfNiFaKqHNNiwWkRB8Byncxbks5vhX_JgaJpZM4cxlHu .

SC-Duan commented 5 years ago

Hi, Because my genetic map just has 300 loci, but the vcf file has 3Mb SNPs, when I convert the legend file (generated with phased vcf) using make_map.py, the genetic distance of resulted map file is wrong, lots of loci have highest value, like this, 1 1-19446 19446 73.95445194 1 1-19469 19469 85.19335632 1 1-19485 19485 85.47495470 1 1-19757 19757 91.28972988 1 1-19791 19791 92.08974533 1 1-19832 19832 92.41002794 1 1-19847 19847 92.41002804 1 1-19849 19849 93.05559606 1 1-19851 19851 93.37799380 1 1-19858 19858 93.37799390 1 1-19890 19890 93.37799400 1 1-19897 19897 93.37799400 1 1-19922 19922 93.37799400 1 1-19929 19929 93.37799400 1 1-19932 19932 93.37799400 1 1-19940 19940 93.37799400 1 1-19945 19945 93.37799400 1 1-19963 19963 93.37799400 1 1-19965 19965 93.37799400 1 1-19976 19976 93.37799400 1 1-20006 20006 93.37799400 1 1-20011 20011 93.37799400 1 1-20014 20014 93.37799400 1 1-20025 20025 93.37799400 1 1-20038 20038 93.37799400 1 1-20041 20041 93.37799400 1 1-20067 20067 93.37799400 1 1-20120 20120 93.37799400 1 1-20155 20155 93.37799400 1 1-20168 20168 93.37799400 1 1-20170 20170 93.37799400 1 1-20173 20173 93.37799400 1 1-20191 20191 93.37799400 1 1-20203 20203 93.37799400 1 1-20222 20222 93.37799400 1 1-20226 20226 93.37799400 1 1-20228 20228 93.37799400 1 1-20237 20237 93.37799400 1 1-20238 20238 93.37799400 1 1-20257 20257 93.37799400 1 1-20262 20262 93.37799400 1 1-20269 20269 93.37799400 1 1-20270 20270 93.37799400 1 1-20286 20286 93.37799400 1 1-20295 20295 93.37799400 1 1-20327 20327 93.37799400. can this file be fed to ehhbin? Or I can use a physical map in place of a genetic map, just duplicate the physical positions column so that the file is formatted: chr# snpID phys pos phys pos

https://github.com/szpiech/selscan/issues/10

Thank you!

prenderj commented 5 years ago

Hi

Sorry I may be misunderstanding but if only 300 of the 3 million SNPs have a known genetic position then the map probably isn't that useful. You can create a pseudo-genetic map from the physical positions. Obviously results will change but they will generally be pretty correlated.

Cheers James

On Tue, 16 Apr 2019 at 09:22, dzaccook notifications@github.com wrote:

Hi, Because my genetic map just has 300 loci, but the vcf file has 3Mb SNPs, when I convert the legend file (generated with phased vcf) using make_map.py, the genetic distance of resulted map file is wrong, lots of loci have highest value, like this, 1 1-19446 19446 73.95445194 1 1-19469 19469 85.19335632 1 1-19485 19485 85.47495470 1 1-19757 19757 91.28972988 1 1-19791 19791 92.08974533 1 1-19832 19832 92.41002794 1 1-19847 19847 92.41002804 1 1-19849 19849 93.05559606 1 1-19851 19851 93.37799380 1 1-19858 19858 93.37799390 1 1-19890 19890 93.37799400 1 1-19897 19897 93.37799400 1 1-19922 19922 93.37799400 1 1-19929 19929 93.37799400 1 1-19932 19932 93.37799400 1 1-19940 19940 93.37799400 1 1-19945 19945 93.37799400 1 1-19963 19963 93.37799400 1 1-19965 19965 93.37799400 1 1-19976 19976 93.37799400 1 1-20006 20006 93.37799400 1 1-20011 20011 93.37799400 1 1-20014 20014 93.37799400 1 1-20025 20025 93.37799400 1 1-20038 20038 93.37799400 1 1-20041 20041 93.37799400 1 1-20067 20067 93.37799400 1 1-20120 20120 93.37799400 1 1-20155 20155 93.37799400 1 1-20168 20168 93.37799400 1 1-20170 20170 93.37799400 1 1-20173 20173 93.37799400 1 1-20191 20191 93.37799400 1 1-20203 20203 93.37799400 1 1-20222 20222 93.37799400 1 1-20226 20226 93.37799400 1 1-20228 20228 93.37799400 1 1-20237 20237 93.37799400 1 1-20238 20238 93.37799400 1 1-20257 20257 93.37799400 1 1-20262 20262 93.37799400 1 1-20269 20269 93.37799400 1 1-20270 20270 93.37799400 1 1-20286 20286 93.37799400 1 1-20295 20295 93.37799400 1 1-20327 20327 93.37799400. can this file be fed to ehhbin? Or I can use a physical map in place of a genetic map, just duplicate the physical positions column so that the file is formatted: <chr#> .

szpiech/selscan#10 https://github.com/szpiech/selscan/issues/10

Thank you!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/evotools/hapbin/issues/56#issuecomment-483561256, or mute the thread https://github.com/notifications/unsubscribe-auth/AHrsGns1Uaf_jOCmXpV2FnGSJZjRfJZaks5vhYgsgaJpZM4cxlHu .

SC-Duan commented 5 years ago

Thank you!