Closed PSB-mibel closed 3 years ago
Looking at the codebase it's apparent that NCBI E-utils is first called in order to 'validate' the provided organism name. Is there a way to disable this validation step?
I have species (e.g. Zea mays cv PH207) for which no taxonomy-id is assigned yet, so this call will fail.
And I noticed that, if one were to provide a taxonomy-id as organism-name, then this E-utils call is used to convert this tax-id to a 'common-name' that is then used to locate the json-file. This is also a counter-intuitive: if the content of the dataDir is being generated dynamically (as an API-endpoint), then one would assume that the provided 'organism' name is being used in the call to locate the JSON file, without any conversions going on.
The most important would still be an option to disable the E-utils lookup, and the dependencies upon this call (validation fo species).
Interesting use case. Do your organisms differ in chromosome length or number from their more common related taxa?
They could indeed (due to quite a difference in number of transposons, between B73 and PH207 for example).
Furthermore, for many of them there isn't yet a full chromosome-level assembly available, just a set of scaffolds/contigs. And because the gene-locations in the annotations are based on those scaffolds, it's no use to try and use the chromosomes of the more common related taxa.
Would this example of using custom organisms and genomes work for you?
Not really, since the main idea would be NOT to rely on pre-generated JSON files being put on a pre-set location. Because the ideogram.js file and all other website code resides on a read-only directory (due to security reasons), dynamically putting the json-files in the ../data/bands/native/ (relative to the ideogram.js) directory is not going to work.
The 'dataDir' API option seemed perfect for my use-case as it allows me to generate the JSON content on the fly as an API endpoint.
Ah, I see! Thanks for helping me understand the use case. Out of curiosity, where are you using Ideogram?
In PLAZA (https://bioinformatics.psb.ugent.be/plaza/) there's currently a feature called WGMapping, which allows the annotation of features on the chromosomes of species. These can be gene-family members, GO terms, etc. (example: https://bioinformatics.psb.ugent.be/plaza/versions/plaza_v4_5_dicots/genome_mapping/index/gf/HOM04D000005/ath). So very much like your annotations example (https://eweitz.github.io/ideogram/annotations-basic). Now, the current implementation is very old-school (generating PNGs on the fly) and in dire need of an update with more modern web tech. And Ideogram.js seems like the easiest replacement, without having to develop something from scratch.
That feature seems quite useful. I'll develop towards this dataDir
refinement, and update here within a week.
This feature is moving along. See examples/custom-organism.html
in custom-organism-datadir
for a working example.
I'll refine that to also work without file extensions (e.g. foo.bar/api/custom or foo.bar/api/custom.json), and add tests, then I'll update here.
Thanks a lot for the hard work! I know it's not required of you, and therefore I really appreciate it.
This is ready for a look: see #266. Building from source would help iterate faster on any refinements. Rough steps:
git clone https://github.com/eweitz/ideogram
cd ideogram
npm run build
<script src="../../dist/js/ideogram.min.js"></script>
"dependencies": {"ideogram": "file:../path/to/your/locally-built/ideogram"}
in package.json
Let me know if you can try by building from source, otherwise I can do a full release.
Support for this will be in the next release.
Hi, apologies for not replying earlier. For some reason the previous notification email from GitHub about this issue had ended up in my spam folder :-/
By building using the latest code, there is still an 'error' message (Error: Organism "atr" is generally unknown; it was not found in the NCBI Taxonomy database. If you did not intend to specify a novel or custom taxon, then try using the organism's scientific name, e.g. Homo sapiens or Arabidopsis thaliana.) but the code continues to run fine after printing this error.
I assume this is the expected behaviour?
Anyway, thanks for this specific development!
Throwing a custom error was expected, but, given your comment, perhaps unideal. I replaced the error with a warning.
Yes, I think a warning is a more clean solution to this.
Michiel, support for this was released in Ideogram 1.30.0, a little over a week ago.
Could you ping me here or at eric.m.weitz@gmail.com when it's viewable in PLAZA? I'd be interested to see. And let me know if more would help!
If I use a custom dataDir, and use a single word (e.g. 'ath' instead of 'arabidopsis-thaliana'), then the code does indeed try to load the 'ath.json' file from the defined dataDir. However, the code subsequently crashes ("Uncaught (in promise) TypeError: can't assign to property "chrIndex" on "1": not an object" --> chromosome-model.js:174).
If I use the same custom dataDir, but use a compound word (e.g. 'arabidopsis-thaliana') instead (and make sure the 'arabidopsis-thaliana.json' file exists), then everything works fine.
Using other compound words (e.g. 'ath-species') also don't work, which leads me to believe there's some hard-coding of species-names going on despite forcing the dataDir location.
Whether this is a bug or intended I'll leave in the middle, but it would be great if this could be added to the API-entry for 'organism', as it is not very intuitive and may help others who experienced the same issue