NEXTBioinformatics / Best-Practices-for-Processing-HTS-Data

3 stars 2 forks source link

Reference Genome #6

Open micknudsen opened 7 years ago

micknudsen commented 7 years ago

In the Datasets section, it is stated that we use b37 in NEXT Bioinformatics, but at MOMA we use hg19. Unfortunately, while the two are very similar, that are not 100% identical, so we should expect some minor differences when comparing pipelines.

Should we just write that we use b37/hg19?

In fact, we are using a modified version of hg19, where the pseudo-autosomal regions PAR1 and PAR2 are masked on the Y chromosome. I will add a section about that soon.

rfbrondum commented 7 years ago

I think for the moment it should be fine just to say that we are using both. We could easily switch to hg19 in Aalborg, since we still haven't had the chance to analyze real data, but the phase 1 unit still uses GRCh37, so that doesn't solve it.

However, if we are going to combine data in a common database at some point maybe we need to agree on a common reference in the future?