broadinstitute / 2023_12_JUMP_data_only_vignettes

Collection of JUMP documentation and projects for internal and public consumption
1 stars 1 forks source link

Definition of ORF and CRISPR similarity clusters #20

Closed auranic closed 5 months ago

auranic commented 9 months ago

In the discussions I see mentioning clusters derived from ORF and/or CRISPR similarities. Do you have fixed definitions of these clusters? If yes would it be possible to share them with Evotec? If not do you think it is a good idea to fix them? This would allow us to quantify "explainable" connections within and between clusters and produce a more coarse-grained representation of the results, that can be useful for the manuscript I guess.

AnneCarpenter commented 9 months ago

We've typically defined clusters by a process like this (a bit messy, not claiming precision!):

It's been easiest to look for individual pairs that are very strong, because it's quick to google to identify their known functions and think about how they might be connected, and to look for any known connections between them, and to find someone who studies one and is willing to do an experiment on the other. We began with several of these 2-gene 'clusters'. Some changed a bit over time as we shifted methodologies such that they might not meet our refined criteria especially for the first step above, but we continue if they seem real.

We also began to dig into the big SLC/OR cluster because it's very strong/obvious.

Only recently, we began to look at some other gene clusters with more than 2 genes in them; this is where I looped in @Zitong-Chen-16 and @jessica-ewald to begin digging into the biology of them to see if they have a story.

@tjetkaARD does this all sound accurate to you?