Open Jeena-Rajan opened 2 weeks ago
There have been a couple of calls on how to represent Biolog data: https://docs.google.com/document/d/1SI53ouUFWlsQpxqWP3To6JAYtWbTROyqgmsmdXly31o/edit
Possible solutions are: 1) structured data associated with BioSamples 2) BioStudies
For 1) currently no way to search strcutured data in BioSamples (could implement search like Holofoods
2) Biostudies cannot associated files with samples (don't think so)
We have investigated with 1 table in the past: https://wwwdev.ebi.ac.uk/biosamples/samples/SAMEA131380422
Aside from the table being gigantic (This would be remediated by the fact that the tables would be subsets, but still...), they're also alphabetically sorted by the first column's values. Not really usable as is.
Tried to create a study https://wwwdev.ebi.ac.uk/biostudies/studies/S-BSST2455 and added a table with the data, with an extra field (Sample) that links to the sample in BioSamples.
This is also not an ideal solution - We are mostly treating the samples as the source of truth for all the data, and this does not create a xref link in biosamples (You can go from the BioStudy to the sample, but not the other way around). I would prioritise the BSD --> BioStudy link rather than the other way around.
For this, we need to follow the next steps:
plate
timepoint
data_preprocessing_status
file
: url to the file in BioStudiesDone - This actually looks like the best solution so far. I like that having a link from the sample to the file in Biostudies, even though in BioStudies there's no direct link, is the best solution we can offer. A future service can pick up the file from the sample (keep in mind, the table is called "Biolog", so that can be parsed by another service feeding from this data)
I can think of a couple of cons, though:
Study the different possible approaches to store Biolog data: