ebi-ait / microbe-central

Central repository to store MICROBE-related issues
0 stars 0 forks source link

Determine how to represent Biolog data #8

Open Jeena-Rajan opened 2 weeks ago

Jeena-Rajan commented 2 weeks ago

Study the different possible approaches to store Biolog data:

Jeena-Rajan commented 2 weeks ago

There have been a couple of calls on how to represent Biolog data: https://docs.google.com/document/d/1SI53ouUFWlsQpxqWP3To6JAYtWbTROyqgmsmdXly31o/edit

Possible solutions are: 1) structured data associated with BioSamples 2) BioStudies

For 1) currently no way to search strcutured data in BioSamples (could implement search like Holofoods

2) Biostudies cannot associated files with samples (don't think so)

ESapenaVentura commented 2 days ago

Use BioSamples and have multiple tables

We have investigated with 1 table in the past: https://wwwdev.ebi.ac.uk/biosamples/samples/SAMEA131380422

Aside from the table being gigantic (This would be remediated by the fact that the tables would be subsets, but still...), they're also alphabetically sorted by the first column's values. Not really usable as is.

ESapenaVentura commented 2 days ago

Use BioStudies to store data and create external xref to BioSamples

Tried to create a study https://wwwdev.ebi.ac.uk/biostudies/studies/S-BSST2455 and added a table with the data, with an extra field (Sample) that links to the sample in BioSamples.

This is also not an ideal solution - We are mostly treating the samples as the source of truth for all the data, and this does not create a xref link in biosamples (You can go from the BioStudy to the sample, but not the other way around). I would prioritise the BSD --> BioStudy link rather than the other way around.

ESapenaVentura commented 2 days ago

Create structured table with some Biolog metadata that links to BioStudies

For this, we need to follow the next steps:

Done - This actually looks like the best solution so far. I like that having a link from the sample to the file in Biostudies, even though in BioStudies there's no direct link, is the best solution we can offer. A future service can pick up the file from the sample (keep in mind, the table is called "Biolog", so that can be parsed by another service feeding from this data)

I can think of a couple of cons, though: