broadinstitute / pooled-cell-painting-profiling-recipe

:woman_cook: Recipe repository for image-based profiling of Pooled Cell Painting experiments
BSD 3-Clause "New" or "Revised" License
6 stars 4 forks source link

Sanitize gene column #48

Closed gwaybio closed 4 years ago

gwaybio commented 4 years ago

In one recent experiment, the gene column was of the following format: "GENENAME_GUIDEID". Our previous recipe expected a column to just contain the GENENAME. The guide ID could be useful information used to link between different resources (i.e. if the guide id is used in different places in a single experiment, including this info in a separate column would be helpful to retain). A simple solution parsing by underscores will not work since the "control_barcodes" entries in this column will break.

Currently, I will add the following solution:

I will scan to these columns, check for inconsistencies, auto-detect the anomalies, and then parse given the control_barcodes ingredient concern.