Closed gwaybio closed 4 years ago
@gwaygenomics I used InChIKey14
instead of InChIKey
because the latter suffers from the same problem as broad_id
, which is, both account for a compound's stereochemistry. If compounds have different stereochemistry across different repurposing hub versions, we wouldn't be able to map across versions. In your example above (https://github.com/broadinstitute/lincs-cell-painting/issues/17#issue-605122285) the first four rows represent one isomer while the last three represent another. The broad_id
and InChIKey
are the same for the first four compounds and the last three compounds while InChIKey14
is the same across all of them.
As we briefly discussed in https://github.com/broadinstitute/lincs-cell-painting/issues/11#issuecomment-612176739, ignoring stereochemistry may not be ideal. If different stereoisomers have different MOA annotations that are significantly different, perhaps the strategy of using InChIKey14
as the common field for mapping across the different repurposing hub versions is inadequate.
We discussed this issue in the profiling checkin - the full summary is here https://github.com/broadinstitute/lincs-cell-painting/issues/11#issuecomment-618480910
The pertinent info for this issue is:
To solve the different stereoisomer issues, we will create an alternate_moa and alternate_target column in the cases where the same InChiKey14 maps to two different moa/targets on the basis of different stereochemistry.
Concretely, the profiles for the compound above would look like this:
Metadata_broad_id | Metadata_moa | Metadata_target | Metadata_alternative_moa | Metadata_alternative_target |
---|---|---|---|---|
BRD-K78431006 (or whichever 2016 Broad ID matches to InChiKey14 KTEIFNKAUNYNJU) | ALK tyrosine kinase receptor inhibitor | ALK,MET | MTH1 inhibitor | NUDT1 |
We will also have to make some manual ordering decisions (i.e. which one is primary and alternative moa).
@gwaygenomics I believe the markdown renderer mistook the pipe between ALK and MET to indicate column separation in the markdown table. Just wanted to bring that to your attention.
thanks - updated
In #12 we used
InChIKey14
to map broad_ids and in #11 we discussed why this is important.While processing some data, I noticed that InChiKey14s do not map uniquely to MOA and Targets. I guess this is not surprising given that drugs are often used for different indications in various clinical phases, but it is worth documenting here! It is dangerous to use InChIKeys14s to map directly to MOA/Targets.
For example, InChIKey14
KTEIFNKAUNYNJU
maps to two MOA/Targets. However, it looks like the full InChIKey does map uniquely. I didn't comprehensively explore this.@niranjchandrasekaran - maybe I missed this, but was there a reason to use InChiKey14 instead of the full InChiKey?