Possible feature: automatically process MIAPPE/ISA-TAB metadata

danforthcenter / plantcv

Plant phenotyping with image analysis

Mozilla Public License 2.0

659 stars 264 forks source link

Possible feature: automatically process MIAPPE/ISA-TAB metadata #65

Closed jshoyer closed 8 years ago

jshoyer commented 8 years ago

Description

Perhaps the lead developers have considered the metadata formats that other groups are promoting (below). Personally I find the amount of XML boilerplate required onerous, so I have not wrapped my head around the formats. Processing ISA-TAB tables with python should not be too hard, and so may be worth discussing. This functionality might improve compatibility and/or convenience. Lacking that, the 2016-06 PDF table for MIAPPE may have ideas worth considering.

Context

See 'ISA-Tab for phenotyping' page. Krajewski et al. 2016 described the format. Arend et al. 2016 provide a dataset (A. thaliana C24 phenotyped with a new IPK Lemnatec system) documented with these standards. Their data might provide a nice test case for enhancing rosette plant analysis tools.

┆Issue is synchronized with this Asana task

nfahlgren commented 8 years ago

Definitely an ongoing discussion, see here. Our collaborators in the Plant Imaging Consortium are using MIAPPE for the HTP system at Arkansas State.

We also just learned about PODD.

nfahlgren commented 8 years ago

I should also say that we should bring Mindy and other people that have deployed systems into the discussion, because a lot of these standards focus on the experimental metadata that would need to be provided by experimenters.

jshoyer commented 8 years ago

I am closing this for now---feel free to reopen whenever. The next thing to do would be to experiment with the isa-api python package: https://isatools.readthedocs.io (And/or https://github.com/ISA-tools/biopy-isatab ?) If that package proves easy to use/reason about, then we can talk further about importing that package or otherwise "baking" the functionality into plantcv.

jshoyer commented 7 years ago

Linking here, for the record, to the recent published description of MIAPPE by Ćwiek-Kupczyńska et al. (2016).

I am still not eager to use XML.

jshoyer commented 7 years ago

I am not sure why I thought that that ISA-Tab used XML by default---it is actually tab-delimited text (TSV), my favorite! (Hence 'Tab'.) The second main way to serialize the ISA Abstract Model is ISA-JSON. http://isa-specs.readthedocs.io/en/latest/isamodel.html

Trying out the the isatools python package (now version 0.5) would still be the next step for doing anything programmatic about this.

Edit (the next day): the XML stuff (presumably created with ISAconfigurator) is configuration for the ISAcreator app. The ISAconfigurator and ISAcreator desktop apps (which you apparently have to build yourself from the Java source...) appear to be abandonware, basically deprecated in favor of the python package.

nfahlgren commented 7 years ago

Ah, nice. I knew they had the TSV format but didn't know they had ISA-JSON. TSV is definitely nice for easy parsing, but not as flexible when you have hierarchical data. The Python json package is really great for reading/writing JSON. Better than any XML parser I have ever tried anyhow :)

jshoyer commented 7 years ago

Related efforts to understand the terminology: https://github.com/PlantPhenoHack2017/HackTopics/issues/3

See also notes on proposed effort to standardize metadata (pre)collection: https://github.com/PlantPhenoHack2017/HackTopics/issues/4