datamol-io / medchem

Molecular filtering for drug discovery.
https://medchem-docs.datamol.io
Apache License 2.0
49 stars 3 forks source link

References for Structural Alerts #21

Closed onexhale closed 5 months ago

onexhale commented 5 months ago

Hi all,

Thanks for the excellent work this is really useful. Would it be possible to get references for the rule sets of the structural alerts, I had a bit of a look around in the source and the docs but haven't been able to find a comprehensive list (apologies if it is included somewhere and pebkac).

Some sources are fairly obvious but it would be good to be able to read the source material for the more esoteric ones as well. A possible enhancement could be to have mc.structural.CommonAlertsFilters.list_default_available_alerts() return DOIs for 'source'.

Cheers!

zhu0619 commented 5 months ago

@onexhale Thank you for your interest in medchem. Some of the information that you are looking can be found in https://github.com/datamol-io/medchem/tree/main/medchem/data, such as common_alerts_collection.csv.

We will try to make the docs and source more accessible for next release. Thank you for your suggestion.

onexhale commented 5 months ago

Thanks for the help, there are a few references there with a few left to fill in. Great work though looking forward to future releases

maclandrol commented 5 months ago

@zhu0619, we have some subset of the alerts well referenced. Some (especially the custom ones) are missing. We can likely add something for the rules (should normally be in their docstring already)

ALERT_INFOS = [
    # Common
    {
        "id": "brenk",
        "name": "BRENK",
        "long_name": None,
        "reference": "https://doi.org/10.1002/cmdc.200700139",
        "description": "Unwanted functionality due to potential tox reasons or unfavourable pharmacokinetic properties.",
    },
    {
        "id": "pains_a",
        "name": "PAINS_A",
        "long_name": "Pan Assay Interference Compounds (family A)",
        "reference": "https://doi.org/10.1021/jm901137j",
        "description": None,
    },
    {
        "id": "pains_b",
        "name": "PAINS_B",
        "long_name": "Pan Assay Interference Compounds (family B)",
        "reference": "https://doi.org/10.1021/jm901137j",
        "description": None,
    },
    {
        "id": "pains_c",
        "name": "PAINS_C",
        "long_name": "Pan Assay Interference Compounds (family B)",
        "reference": "https://doi.org/10.1021/jm901137j",
        "description": None,
    },
    {
        "id": "nih",
        "name": "NIH",
        "long_name": None,
        "reference": "https://doi.org/10.1039/C4OB02287D",
        "description": "Annotate compounds with problematic functional groups.",
    },
    {
        "id": "zinc",
        "name": "ZINC",
        "long_name": None,
        "reference": "https://blaster.docking.org/filtering/",
        "description": "Drug-likeness and unwanted functional group filters.",
    },
    # ChEMBL
    {
        "id": "bms",
        "name": "BMS",
        "long_name": "Bristol-Myers Squibb HTS Deck Filters",
        "reference": "https://doi.org/10.1021/ci050504m",
        "description": None,
    },
    {
        "id": "dundee",
        "name": "Dundee",
        "long_name": "University of Dundee NTD Screening Library Filters",
        "reference": "https://doi.org/10.1002/cmdc.200700139",
        "description": None,
    },
    {
        "id": "glaxo",
        "name": "Glaxo",
        "long_name": "Glaxo Wellcome Hard Filters",
        "reference": "https://doi.org/10.1021/ci990423o",
        "description": None,
    },
    {
        "id": "inpharmatica",
        "name": "Inpharmatica",
        "long_name": None,
        "reference": None,
        "description": None,
    },
    {
        "id": "mlsmr",
        "name": "MLSMR",
        "long_name": "NIH MLSMR Excluded Functionality Filters",
        "reference": "https://mlsmr.evotec.com/MLSMR_HomePage/pdf/MLSMR_Excluded_Functionality_Filters_200605121510.pdf",
        "description": None,
    },
    {
        "id": "lint",
        "name": "LINT",
        "long_name": "Pfizer LINT filters",
        "reference": "https://doi.org/10.2174/157340605774598081",
        "description": None,
    },
    {
        "id": "schembl",
        "name": "SureChEMBL",
        "long_name": None,
        "reference": "https://www.surechembl.org/knowledgebase/169485",
        "description": 'Structural alerts or "toxicophores" substructures found in chemicals that are highly correlated with undesirable properties typically associated with human or environmental toxicity.',
    },
    # NIBR
    {
        "id": "nibr",
        "name": "NIBR",
        "long_name": None,
        "reference": "https://doi.org/10.1021/acs.jmedchem.0c01332",
        "description": "Novartis 2020 screening deck.",
    },
]