broadinstitute / chem-bio-dos-del

Initiated 2021Q4 for code related to the Broad Chemical Biology DNA-encoded library (DEL) analysis and visualization pipeline
MIT License
1 stars 0 forks source link

Finding redundantly encoded compounds #3

Open codewarrior2000 opened 2 years ago

codewarrior2000 commented 2 years ago

Shuang Liu suggested a feature for the DEL analysis app: "I wanted to find the redundantly encoded compounds (same compound encoded in two ways as cycle 2 and cycle 3 are symmetrical i.e. ABC or ACB, for lib #7, #200, #201) in a DEL analysis plot. For e.g., if I have the cycle IDs of one compound, how can I know the cycle IDs of the same compound encoded by the alternative way. Bruce told me that the current app doesn't have such a function. Is this something that could be incorporated?

I'm thinking that in the 'data retrieval' tab, add a tab where I could enter library id, cycle 1, 2, 3, click a button, and it would search for all compounds in all libraries and return me with the same compounds encoded in different ways.

e.g. if I key in library 201, cycle 1= 13, cycle 2= 117, cycle 3= 161, I should get the answer as library 201, cycle 1= 13, cycle 2= 119, cycle 3= 105 (this is a real example where these two encode for the same compound where cycle 2's 117 is the same building block as cycle 3's 105 in library 201).

It is also useful to have a partial selection e.g. cycle 2/3 only. If I key in lib= 201, cycle 1 blank, cycle 2= 112, cycle 3= 31, it could give me lib 201, cycle 1 blank, cycle 2= 74, cycle 3= 172, AND lib 7, cycle 1 blank, cycle 2= 15, cycle 3= 175. This is also a real example as some cycle 2/3 building blocks in lib 7, 200 and 201 overlap (but assigned to different numbers) If you search these numbers you'll see the same cycle 2/3 structures within the library and across the libraries).

With an additional higher throughput option, if I enter a list in a table format (like how I use a list to request for structures/enrichment value), it could ideally show me whether there are matching ones within the library and in other libraries for each cpd in the list in a list format too.

At this stage I think showing the identity of the other matching IDs directly on the 2D plot when hovering to display more info may be too much to ask for, so data retrieval would be sufficient."

remontoire-pac commented 2 years ago

It seems to me there are multiple mechanisms by which redundant encoding could happen:

Is this something we should handle in the chemistry metadata, though, since it's not specific to which protein is being tested?

lius-broad commented 1 year ago

It happens for the triazine (symmetrical scaffold) libraries i.e. the triazine library #7 itself and also triazine based CiP-DEL libraries #200, 201, 300.

Yes it is a chemistry metadata based problem, not related to protein.