MIAPPE / ISA-Tab-for-plant-phenotyping

ISA-Tab configuration for plant phenotyping
4 stars 4 forks source link

Urgent update of configuration to fulfil MIAPPE 1.1 recommendations #4

Open arendd opened 5 years ago

arendd commented 5 years ago

We need an update on the ISATab configuration to fulfil the latest version of the MIAPPE recommendation in version 1.1.

PapoutsoglouE commented 5 years ago

Take a look at the branch for v1.1 in this repository.
It is not as complete as master (no example files, no txt template) yet, but it does include a guide and a configuration for the ISA Creator. @proccaserra mentioned that he wanted to make some small changes, like adding more templates for specific parameters, but the configuration should be usable already.

arendd commented 5 years ago

Thank you for update. Sorry, I did not see the new branch. The last time I checked the repository there was no 1.1 configuration... Nevertheless I have a question: I read the documentation in the README, and this was quite clear, but there is still one challenge. How can we handle in ISA-Tab the MIAPPE parameter types, that could be also a list and not only a single value, e.g. "air temperature by day". What if there is a list of temperature values for the different time points over the day during my experiement. Are there any suggestions for this, because as far as I know there is not datatype for this in ISATab !?

proccaserra commented 5 years ago

@PapoutsoglouE +1, it is still on my TODO to modify the MIAPPE ISA configurations to reflect the point of discussion we had.

@arendd: it seems you describe a situation where an ISA ParameterValue[] could be defined and point to an external data file holding those measurements. If that's what you are considering/requiring, while technically possible, it would need 2 things:

  1. Allow the ISA field to accept files. (this is done easily in the configuration but would require testing in ISAcreator)
  2. defining a format for such files: I expect this to be more involved owing to the variety of sensors that are being used but we could start looking into options with a first test case. Can you provide examples? Potential issue: the content of these files would be become invisible to ISA validator. An additional component would have to be developed to check such files. I would favour looking into Frictionless JSON Data Packages or Tabular Data Package https://frictionlessdata.io/specs/data-package/ https://frictionlessdata.io/specs/tabular-data-package/
arendd commented 5 years ago

Thank you Eliana, Philippe for your effort.

  1. Here is an example for a MIAPPE 1.1 conform experiment, that we created, but we do this "manually", meaning without your config: https://drive.google.com/drive/folders/1XDloqwVQQN2pWg9BRfy9T0_jmbTOJg_o I will try to update this example as soon as possible using your new configuration. By the way this is an older experiment, that we previously published as ISA-Tab using the MIAPPE 1.0 standard and the corresponding configuration.

  2. At the moment we also try to create an extended example for a newer experiment, where we have also parameter lists. I will check the JSON format and provide an example as soon as possible.

PapoutsoglouE commented 5 years ago

@arendd, thank you for bringing this to our attention!

@proccaserra, I don't think the proposed solution would work smoothly with MIAPPE v1.1's view of environmental measurements.

Just like plant trait observations, measurements referring to environmental attributes should be treated as data: presented in a data file that the assays point to. They should also, similarly, be attached to an observation unit. The environmental attributes themselves should be listed as observation variables, and included in the trait definition file.

For example, if you have a temperature sensor per plot, and plot is one of your observation units, attaching additional observations to it (environmental or phenotypic) is simple.

But if you have a case where measurements from a single sensor, at one specific location in the field, are treated as representative of the whole field/greenhouse compartment, it might be more complicated.
According to the current model, you would have to make one observation unit for the field. That is fine, until you get to actually defining the biological material for that unit - remember: the ISA Study file allows one biological material per line. So, if your study had 100 biological materials, you would have to use 100 lines to cover the single observation unit for the field. For obvious reasons, this is suboptimal.

After discussing with @DanFaria, we came up with the idea of allowing a special source node in the Study file, indicating that the biological material in the observation unit is all the biological material in that specific study. The term itself could be something like "all" or "study".
This would allow the definition of such an observation unit in one line.

I realise this is a bit abstract, so I hope the following picture clears it up:
The yellow part declares 12 plants, one from each accession. The green part shows them merged into a single observation unit (sample column), "field1", to which field-wide measurements would be attached. The orange part is what I am proposing instead (so with this, you'd skip the green part).
issue_reply_img

Does this make sense to you?

cpommier commented 5 years ago

That's an important point. As pointed by @PapoutsoglouE the important thing is that the complexity of the data should remain in the datafile. Only the minimal metadata description must be modeled in ISATab. Therefore, in your case, you while have one CSV datafile for your environment time series. The format of this CSV file can be rather free as long as all the headers are described in your trait definition file. Important point:

For the example from @PapoutsoglouE , the orange part is fine with me and in fact seems to solve a provenance problem that's bugging me. I am not confortable with the green part.