arpcard / aro

The Antibiotic Resistance Ontology (ARO) organizes the information describing the ability of a microorganism to withstand the effects of an antibiotic
Creative Commons Attribution 4.0 International
16 stars 6 forks source link

How do i filter for just ARGs ? #2

Closed tseemann closed 5 years ago

tseemann commented 5 years ago

CARD now has more than just acquired "antiobiotic resistant genes" (ARGs).

My old filtering of the card.json no longer gives me just ARGs. I get lots of virulence genes and efflux pumps etc.

What are the terms I should be filtering on?

I am currently using:

Any help appreciated!

raphenya commented 5 years ago

@tseemann each AMR detection model has the following in the card.json. The card.json is index by model_ids.

    "3427": {
        "model_id": "3427",
        "model_name": "OXA-260",
        "model_type": "protein homolog model",
        "model_type_id": "40292",
        "model_description": "The protein homolog model is an AMR detection model. Protein homolog models detect a protein sequence based on its similarity to a curated reference sequence. A protein homolog model has only one parameter: a curated BLASTP bitscore cutoff for determining the strength of a match. Protein homolog model matches to reference sequences are categorized on three criteria: perfect, strict and loose. A perfect match is 100% identical to the reference sequence along its entire length; a strict match is not identical but the bitscore of the matched sequence is greater than the curated BLASTP bitscore cutoff. Loose matches are other sequences with a match bitscore less than the curated BLASTP bitscore.",
        "model_param": {
            "blastp_bit_score": {
                "param_type": "BLASTP bit-score",
                "param_description": "A score is a numerical value that describes the overall quality of an alignment with higher numbers correspond to higher similarity. The bit-score (S) is determined by the following formula: S = (\u03bb \u00d7 S \u2212 lnK)/ ln2 where \u03bb is the Gumble distribution constant, S is the raw alignment score, and K is a constant associated with the scoring matrix. Many AMR detection models use this parameter, including the protein homolog and protein variant models. The BLASTP bit-score parameter is a curated value determined from BLASTP analysis of the canonical reference sequence of a specific AMR-associated protein against the database of CARD reference sequence. This value establishes a threshold for computational prediction of a specific protein amongst a batch of submitted sequences.",
                "param_type_id": "40725",
                "param_value": "550"
            }
        },
        "model_sequences": {
            "sequence": {
                "5622": {
                    "protein_sequence": {
                        "accession": "ENU54757.1",
                        "sequence": "MNIKAHLLITSAIFISACSPYIVTANPNHSASKSDVKAEKIKNLFNEAHTTGVLVIQQGQTQQSYGNDLARASTEYVPASTFKMLNALIGLEHHKATTTEVFKWDGKKRLFPEWEKDMTLGDAMKASAIPVYQDLARRIGLELMSKEVKRVGYGNADIGTQVDNFWLVGPLKITPQQEAQFAYKLANKTLPFSQKVQDEVQSMLFIEEKNGNKIYAKSGWGWDVNPQVGWLTGWVVQPQGNIVAFSLNLEMKKGIPSSVRKEITYKSLEQLGIL"
                    },
                    "dna_sequence": {
                        "accession": "APOR01000009.1",
                        "fmin": "341736",
                        "fmax": "342561",
                        "strand": "+",
                        "sequence": "ATGAACATTAAAGCACACTTACTTATAACAAGCGCTATTTTTATTTCAGCCTGCTCACCTTATATAGTGACTGCTAATCCAAATCACAGCGCTTCAAAATCTGATGTAAAAGCAGAGAAAATTAAAAATTTATTTAACGAAGCACACACTACGGGTGTTTTAGTTATCCAACAAGGCCAAACTCAACAAAGCTATGGTAATGATCTTGCTCGTGCTTCGACCGAGTATGTACCTGCTTCGACCTTCAAAATGCTTAATGCTTTGATCGGCCTTGAGCACCATAAGGCAACCACCACAGAAGTATTTAAGTGGGATGGTAAAAAAAGGTTATTCCCAGAATGGGAAAAGGACATGACCCTAGGCGATGCCATGAAAGCTTCCGCTATTCCAGTTTATCAAGATTTAGCTCGTCGTATTGGACTTGAGCTCATGTCTAAGGAAGTGAAGCGTGTTGGTTATGGCAATGCAGATATCGGTACCCAAGTCGATAATTTTTGGCTGGTGGGTCCTTTAAAAATTACTCCTCAGCAAGAGGCACAGTTTGCTTACAAGCTAGCTAATAAAACGCTTCCATTTAGCCAAAAAGTCCAAGATGAAGTGCAATCCATGCTATTCATAGAAGAAAAGAATGGAAACAAAATATACGCAAAAAGTGGTTGGGGATGGGATGTAAACCCACAAGTAGGCTGGTTAACTGGATGGGTTGTTCAGCCTCAAGGGAATATTGTAGCGTTCTCCCTTAACTTAGAAATGAAAAAAGGAATACCTAGCTCTGTTCGAAAAGAGATTACTTATAAAAGCTTAGAACAATTAGGTATTTTATAG",
                        "partial": "0"
                    },
                    "NCBI_taxonomy": {
                        "NCBI_taxonomy_cvterm_id": "42791",
                        "NCBI_taxonomy_name": "Acinetobacter baumannii NIPH 1362",
                        "NCBI_taxonomy_id": "1217642"
                    }
                }
            }
        },
        "ARO_accession": "3001716",
        "ARO_id": "38116",
        "ARO_name": "OXA-260",
        "ARO_description": "OXA-260 is a beta-lactamase found in Acinetobacter spp.",
        "ARO_category": {
            "36026": {
                "category_aro_accession": "3000017",
                "category_aro_cvterm_id": "36026",
                "category_aro_name": "OXA beta-lactamase",
                "category_aro_description": "OXA beta-lactamases were long recognized as a less common but also plasmid-mediated beta-lactamase variety that could hydrolyze oxacillin and related anti-staphylococcal penicillins. These beta-lactamases differ from the TEM and SHV enzymes in that they belong to molecular class D and functional group 2d. The OXA-type beta-lactamases confer resistance to ampicillin and cephalothin and are characterized by their high hydrolytic activity against oxacillin and cloxacillin and the fact that they are poorly inhibited by clavulanic acid. Amino acid substitutions in OXA enzymes can also give the ESBL phenotype. The OXA beta-lactamase family was originally created as a phenotypic rather than a genotypic group for a few beta-lactamases that had a specific hydrolysis profile. Therefore, there is as little as 20% sequence homology among some of the members of this family. However, recent additions to this family show some degree of homology to one or more of the existing members of the OXA beta-lactamase family. Some confer resistance predominantly to ceftazidime, but OXA-17 confers greater resistance to cefotaxime and cefepime than it does resistance to ceftazidime.",
                "category_aro_class_name": "AMR Gene Family"
            },
            "35951": {
                "category_aro_accession": "0000032",
                "category_aro_cvterm_id": "35951",
                "category_aro_name": "cephalosporin",
                "category_aro_description": "Cephalosporins are a class of beta-lactam antibiotics, containing the beta-lactam ring fused with a dihydrothiazolidine ring. Together with cephamycins they belong to a sub-group called cephems. Cephalosporin are bactericidal, and act by inhibiting the synthesis of the peptidoglycan layer of bacterial cell walls. The peptidoglycan layer is important for cell wall structural integrity, especially in Gram-positive organisms.",
                "category_aro_class_name": "Drug Class"
            },
            "36017": {
                "category_aro_accession": "3000008",
                "category_aro_cvterm_id": "36017",
                "category_aro_name": "penam",
                "category_aro_description": "Penams, often referred to as penicillins, are a group of antibiotics derived from Penicillium fungi. Penicillin antibiotics are historically significant because they are the first drugs that were effective against many previously serious diseases such as syphilis and Staphylococcus infections. Penicillins are still widely used today, though many types of bacteria are now resistant. All penicillins are beta-lactam antibiotics in the penam sub-group, and are used in the treatment of bacterial infections caused by susceptible, usually Gram-positive, organisms.",
                "category_aro_class_name": "Drug Class"
            },
            "36000": {
                "category_aro_accession": "0001004",
                "category_aro_cvterm_id": "36000",
                "category_aro_name": "antibiotic inactivation",
                "category_aro_description": "Enzymatic inactivation of antibiotic to confer drug resistance.",
                "category_aro_class_name": "Resistance Mechanism"
            }
        }
    },

The model types are as follows:

cat card.json | python -m json.tool | grep '"model_type":' | sort -h | uniq
        "model_type": "efflux pump system meta-model",
        "model_type": "gene cluster meta-model",
        "model_type": "protein domain meta-model",
        "model_type": "protein homolog model",
        "model_type": "protein knockout model",
        "model_type": "protein overexpression model",
        "model_type": "protein variant model",
        "model_type": "rRNA gene variant model",
arpcard commented 5 years ago

@raphenya can you highlight on how to differentiate ARO versus VIRO versus MOBIO entries?

raphenya commented 5 years ago

@arpcard @tseemann VIRO(https://github.com/arpcard/viro) describes virulence ontology and MOBIO(https://github.com/arpcard/mobio) describes mobile genetic elements. They are both separate from ARO terms at the moment. There are no VIRO and MOBIO terms in the card.json. card.json contains only ARO terms.

tseemann commented 5 years ago

Thank you, this clarified some things for me!