fmazzarotto / ukb_wgs_mapping_200k

This repo contains a map of the genomic coordinates for each WGS VCF block of the 200k WGS UK Biobank release, and the code used to create it.
GNU General Public License v3.0
5 stars 0 forks source link

UKB 500K WGS Mapping #1

Open drmurdock opened 3 days ago

drmurdock commented 3 days ago

Hi @fmazzarotto. Any chance you've generated a similar mapping for the recent UKB 500K WGS release?

Thanks, David

fmazzarotto commented 3 days ago

Hi @drmurdock , Yes I had actually generated it and kept telling myself I should have uploaded that too. You gave me the right push! I just created another repo (ukb_wgs_mapping_500k) - try and check it out. Please let me know if there are any issues with it.

drmurdock commented 2 days ago

This is great @fmazzarotto! Thank you for providing this. You mention using this map to extract >10K variants. How are you doing that? I'm using bcftools view within dnanexus with ~1000 variants but it runs very slow (hours) on one of these pvcf files. Lastly, have you noticed that many of these pvcf files don't contain any variants?

Thanks again David

fmazzarotto commented 3 hours ago

Hi @drmurdock, I have used tabix and extracted all variants at the positions I was interested in at first, as tabix is very fast (so I just extracted "positions" rather than variants). Then I used bcftools to concatenate all files together and to filter by keeping only the variants I was actually interested in