Field for training data

bioimage-io / spec-bioimage-io

Specification for the bioimage.io model description file.

https://bioimage-io.github.io/spec-bioimage-io/

MIT License

18 stars 17 forks source link

Field for training data #391

Closed akreshuk closed 6 months ago

akreshuk commented 2 years ago

It seems like we lost the dedicated field for the model training dataset and it is now absorbed into 'links'. Some considerations:

we claim traceability/reproducibility as one of the main advantages of using our common format
a link to the training data should be human-readable, everyone should understand what they are dealing with.

If it's a link to bioimage.io, it has to be the full link, but it can also link to zenodo or even to the original paper. I would even keep the "links" field with the direct bioimage.io link and still make a separate field like 'training_data' which takes a list of full links. Any opinions?

oeway commented 2 years ago

Sounds good!

Regarding option 1, I wonder how we can best determine the two cases in the implementation. Imagine in Python, do we do if "id" in rdf["training_data"]? It won't work because a full RDF (option 2) can also have the id key. We may have to do something like if "type" not in rdf["training_data"], which doesn't seem nice, right?

Wouldn't it be easier if we make the option 1 as just a string with the id? So we can determine it by the data type? If it's a string, it means an id, otherwise a full RDF?

FynnBe commented 2 years ago

in python I would first check if it is an inplace definition and if not try if it only has id...

FynnBe commented 2 years ago

i.e. marshmallow + marshmallow_union takes care of that. In pure python one could write if set(rdf["training_data"]) == {"id"}

FynnBe commented 6 months ago

we have a training_data field by now