triplising the contents of this repo

kmexter commented 1 month ago

@marc-portier will fill in the details here, but there is a request to triplise the contents of this repo, esp to get the info in the files in https://github.com/emo-bon/sequencing-data/tree/main/shipment/batch-001 into triples of course, @laurianvm and @kmexter worked on incorporating this omics metadata into the emobon data model, so perhaps when that is done, this will also be done? @laurianvm where is that work?

mpo-vliz commented 1 month ago

not a lot of detail to add.

about the tech: it looks like generated turtle is already foreseen (but currently empty)

see the *ttlfile at https://github.com/emo-bon/sequencing-data/tree/main/shipment/batch-001
I assume @bulricht has foreseen the code / pipeline already and is just waiting on the templates to actually fill that file
but even if it is not done yet: that should just be applying the subyt pipeline already active for the observatories (just using different inputs and templates)
as with the observatories we still need to agree (and then apply) on a smart marking of these turtle files in the rocrate-metadata.json (might be dc:conformsTo=some-profile? + schema:encodingFormat=text/turtle) this to streamline the harvesting of these in the kgap-process towards analysis recently started by @cedricdcc -- ie to avoid hacking the filename-patterns in there

about the content and need: afaiu this repo contains in its csv two essential elements (there could be more I am not aware of? -- the more the merrier !)

the link between material-sample-id and ena accession-number --> both need to be turned into URI and associated to each other in a triple (to decide best predicate and direction)
the grouping of samples inside one batch + some meta-data on the batch --> it feels like this grouping could help detect if maybe some significant (?) recurring fault happened in the processing of all samples in it

laurianvm commented 1 month ago

@kmexter to ask if https://github.com/emo-bon/sequencing-data/blob/main/shipment/batch-001/run-information-batch-001.csv each sample is one genetype or multiple?

@kmexter & @laurianvm to add new properties to the ontology (find terms & relationship to)

kmexter commented 1 week ago

answer: those particular ones are one gene type, but we need to allow that one sample can be multiple.

kmexter commented 1 week ago

Hi We now have ENA accession numbers for batch 1 - see https://github.com/emo-bon/sequencing-data/tree/main/shipment/batch-001 (ena-accession-numbers-batch-001.csv and the columns descriptions file which I will make asap). So we need these data triplised now as these information need to get into triples. Meaning the ontology may need updating? Should we make a meeting to sort out the workflow?

emo-bon / sequencing-data

triplising the contents of this repo #13