E-Module Mapping Script

alFrie commented 1 year ago

The e-module part got separated from the rest of the CPTO. We have a working mapping script for the simple ontology and metadata to be mapped. So on this branch we will write a mapping script for the e-module. Things to do:

[x] Edit the metadata exctraction script: Hardcode the specimen age to 28 days and add to yaml-file.
[x] Edit the metadata exctraction script: Extract the experiment duration ("Zeit") and add to yaml-file.
[x] Edit the metadata exctraction script: Make the metadata-dictionary keys match the placeholders.
[x] Map the metadata in the yaml-file to the ontology by replacing placeholders.
[ ] Write a test.

alFrie commented 1 year ago

@raviapatel Please add the orange box for the ID, so I can append an ID after the underscore for linking mix and emodule.

ThiloMuth commented 1 year ago

Test

ThiloMuth commented 1 year ago

I want to review code - please let me do it =)

joergfunger commented 1 year ago

@ThiloMuth if you accept the invitation to the repo, you can review the merge requests, e.g. this one

alFrie commented 1 year ago

@mattheokru Is this way of spelling ("Modul") within the e-module ontology on purpose/ predefined by the ontology or something like that? It looks german to me.

mattheokru commented 1 year ago

No nothing predefined by the Ontology, I will change it. I updated the Ontologies in the pull request "updating Ontologies"

raviapatel commented 1 year ago

@mattheokru Is this way of spelling ("Modul") within the e-module ontology on purpose/ predefined by the ontology or something like that? It looks german to me. Yes this is just individual and you are right this is german way of doing it I would propose to change it to YoungsModulusTestSpecimen_ or EModulusTestSpeciemen

alFrie commented 1 year ago

I have a question regarding the emodul_metadata_extraction.py:

It should create an entry for the "processedFile" key in the dictionary, having as value the path to csv file with values extracted by emodul_generate_processed_data.py. Currently as a placeholder it's a null pointer. Where are these files stored? In the dodo file I find processed_data_emodulus_directory = Path(emodul_output_directory, 'processed_data') # folder with csv data files so should I do the same?

joergfunger commented 1 year ago

Currently, we store the files locally (as you have mentioned with that path), so I would add for now exactly this path. Ultimately, this would have to go to a file server/mongdb/openBIS with a URI, but please talk to @AidaZt or @ThiloMuth on how we should then reference these files in our KG.

AidaZt commented 1 year ago

Andre suggested that we can reference the link/URL to the file. Example of an RDF triple would be: <http://bam.de/material#experiment01> <http://bam.de/properties#rawdata> <http://bam.de/dataserver/rawdata.csv> .

joergfunger commented 1 year ago

But that links is not existing, or how do we intend to store data such that this link is actually a real reference?

firmao commented 1 year ago

But that links is not existing, or how do we intend to store data such that this link is actually a real reference?

Then, we need, at least to talk about a file server, point straight to github raw files, dereferencing URIs providing RDF content, etc.

I suggest we have a short meeting to have an agreement about the best way for us to deal with the raw files. What about Monday after 3pm?

Best regards, Andre Valdestilhas

alFrie commented 1 year ago

About the Transducer Column: According to the drawio we're expecting an integer: "$$TransducerColumn_Value$$"^^xsd:integer The value gets defined within the metadata extraction script of emodule. We talked about saving a list to that key: [1,2,3], giving this result of the mapped onto: con:Transducer_ a con:MeasuringGauge, owl:NamedIndividual ; ns3:hasPmdUnit ns3:Q56402798 ; mid:has_column_index "[1, 2, 3]"^^xsd:integer . We have a list of integers instead of an integer. Is that still valid?

Edit: @raviapatel Is this how you imagined it to be?

firmao commented 1 year ago

About the Transducer Column: According to the drawio we're expecting an integer: "$$TransducerColumn_Value$$"^^xsd:integer The value gets defined within the metadata extraction script of emodule. We talked about saving a list to that key: [1,2,3], giving this result of the mapped onto: con:Transducer_ a con:MeasuringGauge, owl:NamedIndividual ; ns3:hasPmdUnit ns3:Q56402798 ; mid:has_column_index "[1, 2, 3]"^^xsd:integer . We have a list of integers instead of an integer. Is that still valid?

The data type expected is an xsd:integer, therefore it's supposed to be an integer number. If you still not sure about the data type, then store as an xsd:string.

alFrie commented 1 year ago

The data type expected is an xsd:integer, therefore it's supposed to be an integer number. If you still not sure about the data type, then store as an xsd:string.

And a list of integers doesn't fit the integer type, right?

AidaZt commented 1 year ago

I thinks so, because we either have string or integer as a type and we can't refer it as xsd:list or something? I think for now leave it as xsd:string.

firmao commented 1 year ago

if you still need to store a kind of list of values in RDF, there is an example here: https://stackoverflow.com/questions/29669555/dynamic-array-in-rdf-xml

alFrie commented 1 year ago

So this is the current result of the mapping script. Please look at the following three issues:

Only the placeholder EModule_Value doesn't get a key from the metadata.
Height and Width get set to None, since the shape is cylindrical - this results in "None"^^xsd:decimal. That's problematic, None is not of type decimal, right? The type should stay decimal tho. since in the future there won't only be cylindric specimen, if I got that right.
We have more metadata values than placeholders (f.e. weight and so on have no place to get mapped to. This is not a problem for now tho I guess). You can still look through that list of unmapped metadata and see if you'd like to create some individuals for some of them within the ontology?

For your information:

Placeholders get generated through a function so in case we decide of a different placeholder strucutre, we only need to change this small function and not the main function itself.
Tests are still failing because they were designed for Ilias outdated script. @soudehMasoudian is on it (#134 )

@joergfunger @raviapatel, maybe @ThiloMuth wants to have a look at it, too.

raviapatel commented 1 year ago

I thinks so, because we either have string or integer as a type and we can't refer it as xsd:list or something? I think for now leave it as xsd:string.

Ok this is also fine for me

alFrie commented 1 year ago

How will the info about the openBis raw data location get mapped? Will this be defined in the mapping script or created during the metadata extraction so that the mapping script will automatically map it? @joergfunger

joergfunger commented 1 year ago

That should be done during the extraction of the metadata. In the final setup, we will have the data all stored in the openBIS system (metadata), and then extracting this information together with the link to the raw data file should happen. Afterwards, the mapping script will just take that information in the metadata.json and replace that value in the ttl file obtained from the diagrams.net ttl template.

BAMresearch / LebeDigital

E-Module Mapping Script #117