Adding more species is likely to be more complicated than expected, and in general we also need to consider the availability of extended annotations, e.g. see #146 and how we integrate RNA types, cf. #97, #119.
A clear and concise description of todo items.
Using https://ftp.ensembl.org gives us access to most vertebrates and a few model organisms (yeast, fly, and nematode). If we want to extend to a greater variety of yeasts, other arthropods, plants, bacteria, and/or viruses, we need to reconsider our sources, and how to handle them.
As for biotypes, we need to check in details how definitions vary between organisms, e.g. using GET info/biotypes/..., and how this differ from this or from our definitions in BIOTYPES. Besides, we have that:
# note that now <rna_type> can be anything...
@api.route("/biotypes/<rna_type>", methods=["GET"])
@cross_origin(supports_credentials=True)
def get_biotypes(rna_type):
# TODO: do biotypes also depend on RNA type/annotation?
return {"biotypes": MAPPED_BIOTYPES}
Aims/objectives.
Adding more species is likely to be more complicated than expected, and in general we also need to consider the availability of extended annotations, e.g. see #146 and how we integrate RNA types, cf. #97, #119.
A clear and concise description of todo items.
Using https://ftp.ensembl.org gives us access to most vertebrates and a few model organisms (yeast, fly, and nematode). If we want to extend to a greater variety of yeasts, other arthropods, plants, bacteria, and/or viruses, we need to reconsider our sources, and how to handle them.
A "hard coded" check for yeast and worm to wrangle the GTF file name correctly should be better handled, cf. https://github.com/dieterich-lab/scimodom/blob/3c9e10062f5edbde9ee0aa2770c80f80e56af304/server/src/scimodom/services/annotation/ensembl.py#L139
We cannot in fact handle yeast, it has no 3'UTR, so we need a more general solution to handle such cases.
Another general problem is that of chain files. This https://ftp.ensembl.org/pub/release-110/assembly_chain/ is limited to
GET info/biotypes/...
, and how this differ from this or from our definitions inBIOTYPES
. Besides, we have that: