Determine how to represent Biolog data

Jeena-Rajan commented 2 weeks ago

Study the different possible approaches to store Biolog data:

[x] Use BioSamples and have multiple tables (too much much scrolling?)
[x] Use BioStudies to store data and create external xref to BioSamples
[x] Create structured table with some Biolog metadaa that links to BioStudies - Enrique to test these 3 scenarios
[ ] Store Biolog tables elsewhere (S3) and link to BioSamples - messy

Jeena-Rajan commented 2 weeks ago

There have been a couple of calls on how to represent Biolog data: https://docs.google.com/document/d/1SI53ouUFWlsQpxqWP3To6JAYtWbTROyqgmsmdXly31o/edit

Possible solutions are: 1) structured data associated with BioSamples 2) BioStudies

For 1) currently no way to search strcutured data in BioSamples (could implement search like Holofoods

2) Biostudies cannot associated files with samples (don't think so)

ESapenaVentura commented 2 days ago

Use BioSamples and have multiple tables

We have investigated with 1 table in the past: https://wwwdev.ebi.ac.uk/biosamples/samples/SAMEA131380422

Aside from the table being gigantic (This would be remediated by the fact that the tables would be subsets, but still...), they're also alphabetically sorted by the first column's values. Not really usable as is.

ESapenaVentura commented 2 days ago

Use BioStudies to store data and create external xref to BioSamples

Tried to create a study https://wwwdev.ebi.ac.uk/biostudies/studies/S-BSST2455 and added a table with the data, with an extra field (Sample) that links to the sample in BioSamples.

This is also not an ideal solution - We are mostly treating the samples as the source of truth for all the data, and this does not create a xref link in biosamples (You can go from the BioStudy to the sample, but not the other way around). I would prioritise the BSD --> BioStudy link rather than the other way around.

ESapenaVentura commented 2 days ago

Create structured table with some Biolog metadata that links to BioStudies

For this, we need to follow the next steps:

[x] Create the study in BioStudies (https://wwwdev.ebi.ac.uk/biostudies/studies/S-BSST2455)
[x] Create a sample to test the structured data (https://wwwdev.ebi.ac.uk/biosamples/samples/SAMEA131428577)
[x] Create and submit the structured data with the following attributes:
- plate
- timepoint
- data_preprocessing_status
- file: url to the file in BioStudies

Done - This actually looks like the best solution so far. I like that having a link from the sample to the file in Biostudies, even though in BioStudies there's no direct link, is the best solution we can offer. A future service can pick up the file from the sample (keep in mind, the table is called "Biolog", so that can be parsed by another service feeding from this data)

I can think of a couple of cons, though:

Structured data tables are sorted alphabetically - Even the columns. The results are not great visually.
It becomes a very manual process to add the new files, and doesn't seem very sustainable for the future.
- Need to add a "external" process (Remind people to also upload their files to BSD and add the structured data)
- Need to re-submit the Study every single time...

ebi-ait / microbe-central