AstraZeneca-NGS / reference_data

Reference data: BED files, genes, transcripts, variations.
81 stars 29 forks source link

CDS-canonical.bed #2

Open ghost opened 6 years ago

ghost commented 6 years ago

Dear Developers --

I wanted to ask how your hg19 version of CDS-canonical.bed was developed. I'd like to describe it in a reproducible way. Many thanks -- James Robert White

vladsavelyev commented 6 years ago

Hi James,

I used this script to generate it. The script is a part of a bigger project and can be tricky to understand, but basically all what it does is loading the Ensembl GTF file (e.g. from bcbio bundle) and fetches the CDS features; then filters to get only records which ENS transcript ID is from the canonical list. The canonical transcript list is available is in this repo too, you can find the details how it's generated in the README.

Hope this helps!

Vlad