Definition of ORF and CRISPR similarity clusters

We've typically defined clusters by a process like this (a bit messy, not claiming precision!):

filter out reagents/genes that 'do not have a phenotype' (ie not reproducibly different from neg controls)
define some top-ranked correlations and/or anti-correlations, in ORF and/or CRISPR
look at a visual heatmap of all genes involved, plotted together
visually look for clusters with low KG scores, as potential new discoveries

It's been easiest to look for individual pairs that are very strong, because it's quick to google to identify their known functions and think about how they might be connected, and to look for any known connections between them, and to find someone who studies one and is willing to do an experiment on the other. We began with several of these 2-gene 'clusters'. Some changed a bit over time as we shifted methodologies such that they might not meet our refined criteria especially for the first step above, but we continue if they seem real.

We also began to dig into the big SLC/OR cluster because it's very strong/obvious.

Only recently, we began to look at some other gene clusters with more than 2 genes in them; this is where I looped in @Zitong-Chen-16 and @jessica-ewald to begin digging into the biology of them to see if they have a story.

@tjetkaARD does this all sound accurate to you?

broadinstitute / 2023_12_JUMP_data_only_vignettes

Definition of ORF and CRISPR similarity clusters #20