Ontology water diagram - Githubissues

isanti commented 1 year ago

(sorry for the late feedback) This is about the diagram ontology/emobon_data_model-water.drawio.png but same apply to the sediment one. Please note, that I am not sure how important the notes below are. I am mentioning everything I see and we can then discuss if significant and if we need to make changes.

connected to observatory there is emobon:soilType -> sediment_type but this is the water model.
I think a sample should be connected to a sampling. There is no arrow connecting them but the connection might be implied by the path in each of the large green nodes.
I see that sampling is connected to geo_loc_name. This is ok. But if connected to that, then it should be connected to all location terms we use (loc_broad_ocean, loc_regional, loc_loc, longitude, latitude), right? Similar connection for the connection to the tot_depth_water_col. Doesn't the connection of sampling-observatory imply that sampling is connected to all parameters in observatory? And if yes, then we don't need to add any specific observatory terms directly connected to sampling.
comm_samp is only relevant for soil/sediment
connected to the large green sample node we also need: source_mat_id , failure_comment (probably connected to failure?)
Not sure about the definition of sampling_event. For me a sampling event is the event/activity to collect samples taking place in one observatory in a specific date. But a sampling event can include samples of different samp_mat_process. And in that case the samp_mat_process should be connected to a sample not to a sampling event. What do you think @kmexter @laurianvm? Should we change that? Also, there is the sampling_event that indicates a specific sampling event and is for example BPNS_Wa_210701 (obs name, Water, date).
For the question in the bottom note. I think depth should be connected to each sample, date can be connected to the sampling event (then the sample is connected to a sampling event). For me all location terms can stay only in observatory then the sampling event is connected to the observatory and the sample to the sampling event. And there is an indirect connection of the sample to the location terms. What do you think @kmexter @laurianvm ?
For me it does not matter if we put the same information in different location. I see there are a few terms repeated in different locations. Why would we not want that? (I cannot think of a response..)

laurianvm commented 1 year ago

thanks for the feedback! some initial comments.. point 1 and 4: there are conditional statements in the template that ensure those terms are only added when a value is present in the column - so that should be okay

point 5: yes, 'source_mat_id' can be added with schema:identifier and for 'failure_comment' was already added recently --> (note to self) add the term

point 2: we can't assume the path in the url means something (because a computer can never know) - so they should be connected, and they are via sosa:hasResult and prov:wasGeneratedBy
but I see there is a link is missing between the green colored sample node and white colored sample node --> (note to self) would need to be added

point 6: we've currently defined a sampling event as a sampling activity that takes place at a specific date, in an observatory and following a samp_mat_process (resulting in a sample) -> this does result in more 'sampling events', but samp_mat_process can be linked to it without problem - we went for this approach because it felt slightly more intuitive to link a protocol to an activity rather than the resulting sample, but we could also define it as you propose without any problems;
What do you mean with the last part of your comment here?

point 3: the connection of sampling-observatory does indeed imply that, though to get that information, when e.g. querying the graph, it would require an extra search step -> this relates also to point 7 and 8 point 7 and 8: there is a trade-off between repeating info and making the graph bigger in size but easier to navigate vs. making it smaller in size but then increasing the search steps when navigating it -- however, don't know if in practice this would have any noticeable effect in this case -- so also inclined to repeat information as you suggested in point 8 --> so (note to self) location terms need to be added

isanti commented 1 year ago

For points 1, 4, 5, 2, 3, 7, 8 all ok for me.

For point 6: yes, I see the rational for the sampling event definition. I think the difference is very small and it won't affect any of our upcoming work. I suggest leaving it like that (protocol linked to the activity, samp_mat_process linked to sampling event). With my last bit I meant that there is a term called sampling_event (for example BPNS_Wa_210701) that include the place, time and what sampling this is (water/sediment) (obs name_Water_date). And was wondering if this term called sampling event but not including the samp_mat_process might perplex things in any way.

emo-bon / emobon-ontology

Ontology water diagram #1