Open enricoferrero opened 7 years ago
Hi @enricoferrero:
For the CMAP dataset, if you are able to load the PharmacoSet object into R then running drugInfo(CMAP)
will give a table of data about the drugs.
Out of universal identifiers we only have the ChemBank ID (CBID column) for these drugs, provided by the original study authors.
For the L1000 data, the study authors provided "canonical_smiles", "inchi_key", "inchi_string" columns in their annotations, also stored in the drugInfo(L1000_compounds)
table. However, in our experience there can be both missing and incorrect entries, which is why these ids were not used as the drug identifiers inside the PharmacoSet objects.
@p-smirnov: thank you, the inChIKey should be handy. What type of ID is the one in the pert_id
column? UniChem does not recognise it as a LINCS identifier.
Not really an issue, just a question I couldn't find the answer to in the docs.
What nomenclature or ontology is used to name compounds in the CMap and LINCS L1000 PharmacoSets and perturbation signatures?
E.g.:
Where are the names metmorfin, phenformin etc. coming from? Are unique IDs (e.g.: ChEMBL IDs) stored somewhere in the PharmacoSet or perturbation signature objects?
Thanks!