UNR-CFB / lahontan

Lahontan: a flexible, multiscale RNA-seq pipeline
GNU General Public License v3.0
2 stars 0 forks source link

automatically generate GeneID+description text file for Ensembl genomes #12

Open rltillett opened 8 years ago

rltillett commented 8 years ago

Current pipeline, as described in wiki example, uses reference genome+cdna FASTA files and gene feature files (GTF) from Ensembl. None of these inputs contain human-understandable gene descriptions.

Current manual process is

  1. go to ensembl, select organism, click biomart
  2. in biomart, click attributes, check the boxes for Ensembl Gene ID, Description, Associated Gene Name
  3. download and rename file and plonk in the ref-genome directory

Automated process might involve doing the same via web API or by mirroring their raw xml and querying locally.