SWAT4HCLS / Biohackathon-SWAT4HCLS-2024

1 stars 1 forks source link

[Proposal]: Publishing FAIR Datasets from Bioimaging Repositories #7

Open joshmoore opened 6 months ago

joshmoore commented 6 months ago

Short title

Publishing FAIR Datasets from Bioimaging Repositories

Project Description

The Image Data Resource (IDR) is home to 13 million multi-dimensional image datasets. Each of these is annotated with (a subset of) Gene, Phenotype, Organism/Cell Line, Antibody, siRNA, and Chemical Compound metadata. This data is stored in a data management system named OMERO, where it is stored in PostgreSQL tables.

Initial work has been performed to export this information as RDF from OMERO using https://pypi.org/project/omero-rdf (See the related SWAT4HCLS Poster)

The export of the largest single study (defined as a collection of the image datasets associated with a single publication), however, generates 100M triples. This study representing images of tissue from the Human Protein Atlas has been exported directly using SQL and parallelized scripts.

Continuing work from last year's hackathon this year, we would like to:

Expertise Needed

Familiarity with RDF (incl. but not limited to SPARQL and ingestion/query optimization) is required.

Familiarity with Fair Data Points or more generally DCAT would be beneficial.


This work is is supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – 501864659 as part of NFDI4BIOIMAGE as well as EU Horizon grant 101130216 as part of FoundingGIDE ("Founding a Global Image Data Ecosystem")

ArghaSarker commented 1 week ago
