Closed eboileau closed 4 months ago
Do biotypes definition are consistent across species, or does this also varies?
Hi @eboileau - please do not look at ftp.ensemblgenomes... this is a different type of project. We only support what is on main EnsEMBL i.e. ftp.ensembl.org
EnsEMBL supports vertebrates mostly plus model organisms such as yeast, fly and nematode (C. elegans).
Plants and bacteria are not supported. We need to defer this to later.
only https://ftp.ensembl.org/pub/release-110/gtf/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.110.gtf.gz works, but also https://ftp.ensembl.org/pub/release-110/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.46.110.chr.gtf.gz and https://ftp.ensembl.org/pub/release-110/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.110.gtf.gz
Ok, for now this is what we do:
release-110/assembly_chain
instead of current_assembly_chain
.# ncbi_taxa.csv
4932 Saccharomyces cerevisiae S. cerevisiae fa5d5e2b
# annotation.csv
4 110 4932 ensembl cp6qKL4t4Wws
# assembly.csv
4 R64-1-1 sacCer3 4932 K9FeTPiZ4abQ
and patch database (this should be w/o problem now, as we have so far no yeast data)
delete from assembly where id = 4;
delete from annotation where id = 4;
delete from ncbi_taxa where id = 4932;
We need to come back to this issue at a later time point. We will most likely need to add yeast and bacteria sooner or later.
Another general problem is that of chain files. This https://ftp.ensembl.org/pub/release-110/assembly_chain/ is limited to
caenorhabditis_elegans/
danio_rerio/
homo_sapiens/
mus_musculus/
saccharomyces_cerevisiae/
As for biotypes, we need to check in details how definitions vary between organisms, e.g. using GET info/biotypes/...
, and how this differ from this or from our definitions in BIOTYPES
(specifications.py).
A clear and concise description of what the bug is.
I assumed that for most species, if not all, there would be a single source, and a more or less standard format...
What differences there is between https://ftp.ensembl.org/pub/release-110/gtf/ and https://ftp.ensemblgenomes.ebi.ac.uk/pub/ ? For some species there is an overlap, but not all, and the version numbering is different, e.g.
https://ftp.ensembl.org/pub/release-110/gtf/saccharomyces_cerevisiae/ vs. https://ftp.ensemblgenomes.ebi.ac.uk/pub/fungi/release-59/gtf/saccharomyces_cerevisiae/ ?
cf. https://www.ensembl.org/index.html, http://bacteria.ensembl.org/index.html, http://fungi.ensembl.org/info/data/ftp/index.html and http://plants.ensembl.org/info/data/ftp/index.html.
Output or error messages.
For now, these were omitted:
because I don't know which source and which format to use...
... and we have assemblies for
but no annotation, due to