single word as organism with custom dataDir doesn't work

PSB-mibel commented 3 years ago

If I use a custom dataDir, and use a single word (e.g. 'ath' instead of 'arabidopsis-thaliana'), then the code does indeed try to load the 'ath.json' file from the defined dataDir. However, the code subsequently crashes ("Uncaught (in promise) TypeError: can't assign to property "chrIndex" on "1": not an object" --> chromosome-model.js:174).

If I use the same custom dataDir, but use a compound word (e.g. 'arabidopsis-thaliana') instead (and make sure the 'arabidopsis-thaliana.json' file exists), then everything works fine.

Using other compound words (e.g. 'ath-species') also don't work, which leads me to believe there's some hard-coding of species-names going on despite forcing the dataDir location.

Whether this is a bug or intended I'll leave in the middle, but it would be great if this could be added to the API-entry for 'organism', as it is not very intuitive and may help others who experienced the same issue

PSB-mibel commented 3 years ago

Looking at the codebase it's apparent that NCBI E-utils is first called in order to 'validate' the provided organism name. Is there a way to disable this validation step?

I have species (e.g. Zea mays cv PH207) for which no taxonomy-id is assigned yet, so this call will fail.

And I noticed that, if one were to provide a taxonomy-id as organism-name, then this E-utils call is used to convert this tax-id to a 'common-name' that is then used to locate the json-file. This is also a counter-intuitive: if the content of the dataDir is being generated dynamically (as an API-endpoint), then one would assume that the provided 'organism' name is being used in the call to locate the JSON file, without any conversions going on.

The most important would still be an option to disable the E-utils lookup, and the dependencies upon this call (validation fo species).

eweitz commented 3 years ago

Interesting use case. Do your organisms differ in chromosome length or number from their more common related taxa?

PSB-mibel commented 3 years ago

They could indeed (due to quite a difference in number of transposons, between B73 and PH207 for example).

Furthermore, for many of them there isn't yet a full chromosome-level assembly available, just a set of scaffolds/contigs. And because the gene-locations in the annotations are based on those scaffolds, it's no use to try and use the chromosomes of the more common related taxa.

eweitz commented 3 years ago

Would this example of using custom organisms and genomes work for you?

PSB-mibel commented 3 years ago

Not really, since the main idea would be NOT to rely on pre-generated JSON files being put on a pre-set location. Because the ideogram.js file and all other website code resides on a read-only directory (due to security reasons), dynamically putting the json-files in the ../data/bands/native/ (relative to the ideogram.js) directory is not going to work.

The 'dataDir' API option seemed perfect for my use-case as it allows me to generate the JSON content on the fly as an API endpoint.

eweitz commented 3 years ago

Ah, I see! Thanks for helping me understand the use case. Out of curiosity, where are you using Ideogram?

PSB-mibel commented 3 years ago

In PLAZA (https://bioinformatics.psb.ugent.be/plaza/) there's currently a feature called WGMapping, which allows the annotation of features on the chromosomes of species. These can be gene-family members, GO terms, etc. (example: https://bioinformatics.psb.ugent.be/plaza/versions/plaza_v4_5_dicots/genome_mapping/index/gf/HOM04D000005/ath). So very much like your annotations example (https://eweitz.github.io/ideogram/annotations-basic). Now, the current implementation is very old-school (generating PNGs on the fly) and in dire need of an update with more modern web tech. And Ideogram.js seems like the easiest replacement, without having to develop something from scratch.

eweitz commented 3 years ago

That feature seems quite useful. I'll develop towards this dataDir refinement, and update here within a week.

eweitz commented 3 years ago

This feature is moving along. See examples/custom-organism.html in custom-organism-datadir for a working example.

I'll refine that to also work without file extensions (e.g. foo.bar/api/custom or foo.bar/api/custom.json), and add tests, then I'll update here.

PSB-mibel commented 3 years ago

Thanks a lot for the hard work! I know it's not required of you, and therefore I really appreciate it.

eweitz commented 3 years ago

This is ready for a look: see #266. Building from source would help iterate faster on any refinements. Rough steps:

git clone https://github.com/eweitz/ideogram
cd ideogram
npm run build
If using HTML like examples: <script src="../../dist/js/ideogram.min.js"></script>
Else, if using JS framework like React: "dependencies": {"ideogram": "file:../path/to/your/locally-built/ideogram"} in package.json

Let me know if you can try by building from source, otherwise I can do a full release.

eweitz commented 3 years ago

Support for this will be in the next release.

PSB-mibel commented 3 years ago

Hi, apologies for not replying earlier. For some reason the previous notification email from GitHub about this issue had ended up in my spam folder :-/

By building using the latest code, there is still an 'error' message (Error: Organism "atr" is generally unknown; it was not found in the NCBI Taxonomy database. If you did not intend to specify a novel or custom taxon, then try using the organism's scientific name, e.g. Homo sapiens or Arabidopsis thaliana.) but the code continues to run fine after printing this error.

I assume this is the expected behaviour?

Anyway, thanks for this specific development!

eweitz commented 3 years ago

Throwing a custom error was expected, but, given your comment, perhaps unideal. I replaced the error with a warning.

PSB-mibel commented 3 years ago

Yes, I think a warning is a more clean solution to this.

eweitz commented 3 years ago

Michiel, support for this was released in Ideogram 1.30.0, a little over a week ago.

Could you ping me here or at eric.m.weitz@gmail.com when it's viewable in PLAZA? I'd be interested to see. And let me know if more would help!

eweitz / ideogram

single word as organism with custom dataDir doesn't work #265