google-research / bigbird

Transformers for Longer Sequences
https://arxiv.org/abs/2007.14062
Apache License 2.0
563 stars 101 forks source link

Is it valid to train on GRCh38.p13 human reference instead of GRCh37 ? #8

Open lovelyscientist opened 3 years ago

lovelyscientist commented 3 years ago

Dear authors,

Thank you for this outstanding work!

I have a question regarding the reference genome for training genomic model. In your paper you refer to GRCh37, but it seems that it is an outdated version now and Build 38 can be used (https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.39) Do you think it will be valid to train BigBird model on chromosomes of GRCh38.p13 for chromatin profile prediction, considering that DeepSEA training dataset is based on GRCh37? Or is should be same reference genome GRCh37 in both datasets?