Open vioannid opened 1 month ago
I only had time for a very quick review. The recipe is quite detailed, and hard to read in its current format due to the way it's rendered in the Github issue. It would be good if we could generate a branch deployment so we could read it "in situ" as it were. I would recommend creating the recipe in a branch, going through all the necessary admin steps to get it picked up in the correct section, then creating a PR deployment.
(fcb-bioimage)=
bioimage-deposition
Overview
In addition to making sure that data and metadata are well-structured, the FAIR data principles also involve sharing research outputs in ways that go beyond paper publications. However, even the best and most thorough deposition ecosystem would be worthless if no data were deposited. Therefore this datatype-specific recipe for bio-imaging data provides:
If you generate bioimage data and want to make your data FAIR by openly depositing it in the BioImage Archive, this recipe is for you.
Ingredients
Chosing repositories for bioimaging data
Over the last decade, the field of bioimaging has sought to develop a robust and effective bioimaging data ecosystem by creating bioimaging repositories and working towards their widespread adoption. There are a number of resources available to learn more about the existing repositories in each domain and their respective scopes. Here is a short overview of some repositories for bioimaging data:
A central, primary archive for bioimaging data is available through the BioImage Archive (Hartley et al. 2022), which hosts data from all imaging modalities associated to a peer-reviewed publication where no more specialised resource exists.
In contrast, the Image Data Resource (IDR) (Williams et al., 2017) is an added-value database for several microscopic imaging modalities with highly curated metadata, including high-content screening data. The objective is to link the imaging data with other databases, such as those for genetic and chemical information, as well as cell and tissue phenotypes.
The Electron Microscopy Public Image ARchive (EMPIAR) (Iudin et al. 2023) publicly archives 2D electron microscopy raw data underlying 3D cryo-EM protein structures and data from 3D volume EM experiments.
The Systems Science of Biological Dynamics Repository and Database (SSBD) (Tohsato et al., 2016) is a repository and database pair comprising a primary archive and added-value database for quantitative data of spatiotemporal dynamics of biological objects primarily obtained from microscopy
According to the FAIR principles, data should be shared as openly as possible, but as closed as necessary. This principle is especially important for sensitive and biomedical data that cannot be fully openly shared and deposited.
For more detailed information on reposiroties and FAIR practices in bioimaging data, we recommend consulting the RDMkit pages on bioimage data or the repository overview from Euro-BioImaging.
The example dataset
This recepie will detail the data preparation and deposition process on the following real-life example:
Beucher, Guillaume et al. “Bronchial epithelia from adults and children: SARS-CoV-2 spread via syncytia formation and type III interferon infectivity restriction.” Proceedings of the National Academy of Sciences of the United States of America vol. 119,28 (2022): e2202370119. doi:10.1073/pnas.2202370119
To study the spread of SARS-CoV-2 infection, the authors performed light and electron microscopy on bronchial epithelia that were reconstructed from infected adult and child donors. The data from the example dataset comprises light microscopy and some transmission electron microscopy published in a peer-reviewed publication and does not contain person-identifiable images. Therefore, the BioImage Archive is the optimal choice for the deposition of this dataset.
Step-by-step recipe for bioimage data deposition to BioImage Archive
Step 1: Familiarize with the BioImage Archive
The first step in any deposition is to get to know the repository, to assess whether the scope is really appropriate, and to get an overview of the requirements and the deposition process. For the BioImage Archive, this information is also summarised in their corresponding Quick-Tour.
Furthermore, it is important to gain insight into the general architecture of the repositories entries to understand what is possible for specific cases. Browse the archive to see some examples of current entries.
The completed entry from the example dataset, generated through the steps outlined in this recipe, is provided here for reference:
Define the Study Components
As the BioImage Archive uses the REMBI scheme for metadata, the data organisation follows the REMBI scheme and is therefore structured in so-called Study Components. A submission may contain one or more Study Components and there is the possibility to mix and duplicate several REMBI category components in the same Study Component. In this way, the structure of the experimental layout can be freely designed and there are several ways and examples on how to do so for different types of studies.
In the simplest case, a study contains only one Study Component, because only one experimental unit was performed, and each REMBI component appears only once, because a single specimen was taken from a biosample and a single imaging protocol was performed.
The structure of the example study is a more complex as it contains two basic variables: the imaging technologies and the experimental sample. This is the overview scheme of the example dataset and a preview of its organisation in the finished entry.
This process of data organisation is crucial as it determines which metadata needs to be input where and how the File-list (Step 6) will look. Ultimately, it's essential that all the relevant metadata information is included - not where it is represented. Therefore already at this stage you can design what you would like to include in the File-list(s). In a nutshell, a File-list contains all the information about what is different between the files in each Study Component. In our dataset this would for example include whether the images were taken with confocal or epifluorescence microscopy or to which donor the epithelial samples belonged. More details on the File-lists can be found in Step 6.
Now there are two ways to proceed. Either you can follow this recepie and start the deposition by creating a new submission and entering the metadata first and then uploading the files and creating the File-lists. Or you can upload the organised data first (Step 5), then think about and create the File-lists (Step 6), and then afterwards create the submission and enter the missing metadata (Step 4).
Step 4: Upload the data
Several data upload methods are available in the BioImage Archive and different methods are recommended for different data size ranges:
less than 20 GB per individual file
Once in the BioStudies user interface a 'secret directory' will be created for you as a place to upload your data prior to submission.
To upload data using the submission portal, simply click on 'File Upload'. This will allow you to upload the folder(s) you organised in Step 3.
Input the Study-level metadata
BioImage Archive employs REMBI, so the submission interface is structured accordingly. The first component of REMBI is the 'Study', which also is the first section of every BioImage Archive entry. This includes details about the current submission, authors, and the corresponding paper. Some fields will have free-text boxes, while others will have dropdown menus. If a particular item is not available in the dropdown menu, you can enter free text there instead.
If you have duplicate REMBI components in one Study Component, such as confocal and epifluorescence microscopy in the example dataset, you can add association rows to match this structure.
Once you have downloaded the empty File-list using the submission tool, you need to locally edit the File-list to include additional columns describing file-level metadata. Therefore, you need to think about what file-level metadata is essential to include so that others can understand the differences between each file in a Study Component. For each metadata item create one extra column in the File-list. You can either add more columns directly to the generated File-list or, alternatively, copy and paste the "Files" column into another template.
It is helpful to look at some example File-lists for different types of studies to get suggestions for metadata in different studies and further guidance on File-lists.
In the example dataset we have several types of metadata, which are described in more detail in the File-list. Of course this is just an example and the amount of additional information and thus column names will greatly vary between studies. If we had chosen for a different organization in the example dataset, some of this information may already have been described in the general metadata section.
Once you have decided on the additional columns for the File-list, you must fill in the corresponding values for each file in the submission. If files do not have a value for a specific column these can be left empty.
Once you have completed the File-list(s), proceed to upload them again in the submission portal. The File-list can be either uploaded to the corresponding Study Component folders if they are available, or alternatively, they can be uploaded without a folder. If you have multiple File-lists, please ensure that you name them cleary and distinctively.
Associating the File-lists
After having uploaded one File-list per Study Component, we go back to the prepared submission. For this we click again on "Submission" in the top bar. Then we will select the current submission from the "Draft" category.
Earlier (Step 5) we had already generated and described the Study Components present in our submission. We go now again to each of the Study Components and associate the corresponding File-lists to each Study Component, making sure that we have one File-list per Study Component.
Optional: Annotations
When image files are accompanied by annotations, these may also be submitted to the BioImage Archive. In the present example, no annotations are available, however, a brief overview of this process is outlined below.
The metadata for annotations are provided in accordance with the MIFA standard.
The File-list for annotations should always contain a column that relates the annotations to the images you have uploaded (i.e. column name "related_image") and gives the path to that image. Otherwise, you should select the most appropriate metadata for your type of annotations, as you would for images.
Step 8: Submit your data
Now that all data is uploaded, the File-list(s) are associated and all metadata is entered in the portal you are ready for submission. When you have checked that that all the information is correct, click on 'submit' to complete the submission.
Upon successful submission, a unique BioImage Archive accession number will be assigned to your dataset, which serves as a unique identifier within the archive. Additionally, your dataset will get a DOI. You will then be directed to a confirmation window, where you will find instructions on how to access and share your study with others.
ORCID claim
Once your dataset is public, you can associate it with your ORCID profile to ensure that it is attributed to you in this record.
:body: p-0
License