Closed axiomcura closed 1 year ago
@d33bs Hopefully I have attended all your comments. Also, thanks for the great questions!
why are barcodes only required for multiple experiments (and not for single experiments)? For example, if two experiments need to be compared but they are processed individually, would we run into issues when attempting to compare things later on?
I will be used this repo to explain my current understanding.
From what I understand, the barcode file provides an assay-to-platename
pairing. Where the assay plate are sqlite
files (in this case) from cytominer-database
. In the barcode file we see that there is an association between a specific plate map (Plate_Map_Name column) per assay., which contains metadata information that includes: well position, perturbations, etc. Looking at the barcode structure, there are some plate names that repeat 3 times (first 3, middle 3 and last 3) indicating that 3 experiments were conducted in triplicates. (Different plate map name = separate experiment)
Technically, there is no need to have a barcode for a single experiment because it will contain the same external factors among all plates (assuming that more than one plate was used in the experiment). The only time when barcodes are required is if 3 separate experiments were conducted on multiple plates. Therefore, the barcode will help find which plates have been involved with which experiment, thus mapping the correct metadata to those plates when conducting downstream analysis.
would we run into issues when attempting to compare things later on? Since the metadata (platemaps) can be incorporated within the single-cell / aggregate morphological profiles, you can stratify them based on experiments. The merging of the metadata to the morphology profiles is conducted by using the
pcytominer's annotate
where it requires both the profile and platemap as inputs.
However, one needs to map the correct assay with the associated platemap, which CytoSnake
does when annotating multiple plate datasets (assays)
Similarly: are there ever scenarios where we don't have the barcode file but need to run analyses on the experiments? Here especially I'm thinking about previously gathered data where one may no longer have access to all data, or perhaps the data is stored in an unrecognizable format. In these scenarios could you simulate the barcode file's data (providing notation that it's simulated) to help facilitate the work involved with this PR?
The barcodes only provides information that distinguishes which plates came from which experiment. Assuming that the data you are talking about came from 3 separate experiments and no barcodes were provided. A potential solution is that we contact the person who generated this dataset and asks which plates came from which experiment.
However, with the scenario, if no plate maps were provided, then we will not know what types of external treatments were added to the cell and which experiments contained the types of treatments/cell lines used. Therefore, it will be difficult to simulate due to the lack of important metadata data like treatments, well positions, and cell lines used.
I have applied all the changes. Merging now. If there is more work need to be done, please feel free to re-open this PR.
About this PR
This PR adds
CytoSnake
to have logic when handling CLI user based inputs. Specifically, this update introducesbarcode logic
,By default, a barcode is not required as an input to run
CytoSnake
; however, there are some exceptions when barcodes are needed.If a user provides a dataset that has been generated from multiple experiments, then multiple plate maps are associated with the generated data. This will require a barcode file in order for
CytoSnake
to know which plate dataset is associated with which experiment.What's new?
input_guard.py
was created. This will handle all the CLI logiccheck_init_parameter_inputs()
where it takes user based parameters and checks for discrepancies.Implementation
What dictates the barcode logic is the number of plate maps found within the
metadata
data folder. If CytoSnake see's that there is more than 1 plate map, then it requires a barcode. Therefore, users must provide barcodes if multiple plate maps are present.additional Notes
Changes in workflow
Update on workflows:
cp_process_singlecells
; however, workflow config was not created forcp_process
hence this PR contains a new workflow config forcp_process