Closed teslajoy closed 1 month ago
will add pubchem_id as identifier
SELECT
a.MOLREGNO,
a.PREF_NAME,
a.CHEMBL_ID,
a.MAX_PHASE,
a.STRUCTURE_TYPE,
c.STANDARD_INCHI,
c.STANDARD_INCHI_KEY,
c.CANONICAL_SMILES,
d.DOC_ID,
d.PUBMED_ID,
d.DOI,
cr.SRC_ID,
cr.SRC_COMPOUND_ID,
sr.SRC_SHORT_NAME,
sr.SRC_DESCRIPTION
FROM
MOLECULE_DICTIONARY as a
LEFT JOIN
COMPOUND_STRUCTURES as c ON a.MOLREGNO = c.MOLREGNO
LEFT JOIN
ACTIVITIES as p ON a.MOLREGNO = p.MOLREGNO
LEFT JOIN
DOCS as d ON p.DOC_ID = d.DOC_ID
LEFT JOIN
compound_records as cr ON a.MOLREGNO = cr.MOLREGNO
LEFT JOIN
source as sr ON cr.SRC_ID = sr.SRC_ID;
if "PUBCHEM" in scr_short_name: add src_compound_id to identifier list value with src "https://pubchem.ncbi.nlm.nih.gov"
current code's transform.yml and _distcompute.plan passed on local machine over first 100 rows. Executing over all data on cloud.