emo-bon / sequencing-data

The files controlling and describing the sequencing metadata
Apache License 2.0
0 stars 2 forks source link

triplising the contents of this repo #13

Open kmexter opened 1 month ago

kmexter commented 1 month ago

@marc-portier will fill in the details here, but there is a request to triplise the contents of this repo, esp to get the info in the files in https://github.com/emo-bon/sequencing-data/tree/main/shipment/batch-001 into triples of course, @laurianvm and @kmexter worked on incorporating this omics metadata into the emobon data model, so perhaps when that is done, this will also be done? @laurianvm where is that work?

mpo-vliz commented 1 month ago

not a lot of detail to add.

about the tech: it looks like generated turtle is already foreseen (but currently empty)

about the content and need: afaiu this repo contains in its csv two essential elements (there could be more I am not aware of? -- the more the merrier !)

  1. the link between material-sample-id and ena accession-number --> both need to be turned into URI and associated to each other in a triple (to decide best predicate and direction)
  2. the grouping of samples inside one batch + some meta-data on the batch --> it feels like this grouping could help detect if maybe some significant (?) recurring fault happened in the processing of all samples in it
laurianvm commented 1 month ago

@kmexter to ask if https://github.com/emo-bon/sequencing-data/blob/main/shipment/batch-001/run-information-batch-001.csv each sample is one genetype or multiple?

@kmexter & @laurianvm to add new properties to the ontology (find terms & relationship to)

kmexter commented 1 week ago

answer: those particular ones are one gene type, but we need to allow that one sample can be multiple.

kmexter commented 1 week ago

Hi We now have ENA accession numbers for batch 1 - see https://github.com/emo-bon/sequencing-data/tree/main/shipment/batch-001 (ena-accession-numbers-batch-001.csv and the columns descriptions file which I will make asap). So we need these data triplised now as these information need to get into triples. Meaning the ontology may need updating? Should we make a meeting to sort out the workflow?