ga4gh / benchmarking-tools

Repository for the GA4GH Benchmarking Team work developing standardized benchmarking methods for germline small variant calls
Apache License 2.0
192 stars 46 forks source link

GRCh38 / hg38 stratification BED files #27

Open blmoore opened 7 years ago

blmoore commented 7 years ago

Do the BED files under resources/stratification-bed-files exist for GRCh38? If so it would be great if they could be added to the repo

jzook commented 7 years ago

Not yet - I've created some of these, so I'll work on creating the rest of them and add them to the resources.

isthisthat commented 7 years ago

Any timelines on this? Would you accept a PR e.g. for the GCcontent, LowComplexity and mappability tracks (which look straightforward) on this version of the genome for instance: ftp://ftp.ensembl.org/pub/release-90/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz

What's the preferred naming convention / folder structure? Thanks a lot!

EvanTheB commented 6 years ago

Hello, any update on this? Is it possible to translate the beds? or do the resources that created them in the first place exist? Thanks

jasper1918 commented 6 years ago

Curious if anyone else has compiled an hg38 resource similar to those found here? Thought I would ask before I jump in to start creating my own and would prefer these be standardized for all to use.

jzook commented 6 years ago

Sorry for the delays in developing this and for missing this discussion. We plan to start working toward making a standard set for GRCh38 available by the end of 2018. If you're interested in helping with this or have already done this, we'd definitely be happy to hear what you've done.

On Wed, Sep 12, 2018 at 4:26 PM Jeff S Jasper notifications@github.com wrote:

Curious if anyone else has compiled an hg38 resource similar to those found here? Thought I would ask before I jump in to start creating my own and would prefer these be standardized for all to use.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/ga4gh/benchmarking-tools/issues/27#issuecomment-420785439, or mute the thread https://github.com/notifications/unsubscribe-auth/ACU6dkJ3ozmAZgbVlFVujfhg3_Z4LfcIks5uaW3jgaJpZM4MP5LD .

kwdunaway commented 5 years ago

Hi, I'm also interested in getting this data in hg38. What files have you created so far? Even a partially completed list is better than nothing.

jzook commented 5 years ago

I apologize for the delays in completing this due to the federal government shutdown. We do have a small subset of some of the most important large subsets, which we used in our recent manuscript ( https://doi.org/10.1101/281006, in press in Nature Biotechnology), on github here: https://github.com/jzook/genome-data-integration/tree/master/NISTv3.3.2/filtbeds/GRCh38

We hope to make additional files available in the near future. We have also found the various UCSC RepeatMasker files to be useful.

On Fri, Jan 25, 2019 at 1:38 PM Keith Dunaway notifications@github.com wrote:

Hi, I'm also interested in getting this data in hg38. What files have you created so far? Even a partially completed list is better than nothing.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ga4gh/benchmarking-tools/issues/27#issuecomment-457675848, or mute the thread https://github.com/notifications/unsubscribe-auth/ACU6doXiDb02DCy6ZQRB4IJydq4g89kiks5vG08MgaJpZM4MP5LD .