Add more functionality to `SBOannotator`

GwennyGit commented 2 years ago

Feature: The database in the program SBOannotator could be updated automatically to ensure that the user always gets the newest SBO annotations for his model(s).

Possible Implementation:

[ ] Add a function that checks whether there are new SBO terms via the terms, the commit tag or the date (Function A)
[ ] Add a function that deletes all table entries (see INSERT INTO commands and corresponding rows) and automatically adds entries via INSERT INTO (Function B) → This function should extract all SBO terms, get the corresponding BiGG and EC IDs and put this all together to get a new 'updated' data.db file.
[ ] Add a function that at runtime:
- [ ] 1. Checks if there are new SBO terms (Function A → OLS client),
- [ ] 2. If new terms exist: update the data.sql file (Function B) and update data.db (→ initialise_database())

Supplement The SBO repository might help in accessing the new SBO terms.

GwennyGit commented 2 years ago

Further required improvements:

[ ] Adjust & specify the classification of SBO terms from EC numbers
[ ] Get updates on EC numbers

famosab commented 1 year ago

In a recent discussion we decided that adding SBO terms merely from the local database might be insufficient due to future SBO term changes. I found a client tool that allows programmatic access to the SBO terms with Python. At the moment I has some problems with using it and opened an issue at the repository but when that is resolved this seems to be a good way of keeping everything up-to-date and maybe even avoid a local database in the future.

famosab commented 1 year ago

In the SBOannotator some changes to the original code of Elisabeth Fritze were implemented by @NantiaL . It seems that updating this SBOAnn version in refineGEMs is necessary. I started with a transfer of all new functions. At the moment they are not active in the main function which would be the next step. One thing seemed a bit odd for me: the new function

def handleMultipleECs(react, ECNums):
    # if no EC number annotated in model
    if len(ECNums) == 0:
        react.setSBOTerm('SBO:0000176')

    else:
        # store first digits of all annotated EC numbers
        lst = []
        for ec in ECNums:
            lst.append(ec.split(".")[0])

        # if ec numbers are from different enzyme classes, based on first digit
        # no ambiguous classification possible
        if len(set(lst)) > 1:
            react.setSBOTerm("SBO:0000176")  # metabolic rxn

        # if ec numbers are from the same enzyme classes,
        # assign parent SBO term based on first digit in EC number
        else:

            # Oxidoreductases
            if "1" in set(lst):
                react.setSBOTerm("SBO:0000200")
            # Transferase
            elif "2" in set(lst):
                react.setSBOTerm("SBO:0000402")
            # Hydrolases
            elif "3" in set(lst):
                react.setSBOTerm("SBO:0000376")
            # Lyases
            elif "4" in set(lst):
                react.setSBOTerm("SBO:0000211")
            # Isomerases
            elif "5" in set(lst):
                react.setSBOTerm("SBO:0000377")
            # Ligases, proper SBO is missing from graph --> use one for modification of covalent bonds
            elif "6" in set(lst):
                react.setSBOTerm("SBO:0000182")
            # Translocases
            elif "7" in set(lst):
                react.setSBOTerm("SBO:0000185")
            # Metabolic reactions
            else:
                react.setSBOTerm("SBO:0000176")

seems to be less specific than the removed functions like

def checkMethylationViaEC(reac):
    """tests if reac is methylation by its EC-Code and sets SBO Term if true

    Args:
        reac (libsbml-reaction): libsbml reaction from sbml model
    """
    if len(getECNums(reac)) == 1:
        if getECNums(reac)[0].startswith('2.1.1'):
            reac.setSBOTerm('SBO:0000214')

def checkTransaminationViaEC(reac):
    """tests if reac is transamination by its EC-Code and sets SBO Term if true

    Args:
        reac (libsbml-reaction): libsbml reaction from sbml model
    """
    if len(getECNums(reac)) == 1:
        if getECNums(reac)[0].startswith('2.6.1'):
            reac.setSBOTerm('SBO:0000403')

Since I wrote neither the new version nor the older functions I think this needs to be discussed. Maye @NantiaL can help with this!

famosab commented 1 year ago

There is also a new function in SBOAnnotator which is called call_for_EC_annotation which automatically adds EC to reactions with BIGG identifiers. That would also be a good addition to polish. We need to discuss this maybe in issue draeger-lab/refinegems#58.

famosab commented 1 year ago

We need to discuss this further. Merging the branch for now so that we can work on the io module.

draeger commented 1 year ago

Here is an implementation for an OBO parser from the BioPython project that could be used to check the relationships between SBO terms. The latest OBO file for SBO can be obtained from GitHub. Please take a look at a similar Java implementation for SBO.

GwennyGit commented 1 year ago

The extensions of the SBO database as mentioned in the Tasks for this issue should be added to the databases module as this module from now on handles all database-related functions. (See issue draeger-lab/refinegems#49 for more details on databases.)

GwennyGit commented 1 year ago

The client that @famosab mentioned in comment https://github.com/draeger-lab/SBOannotator/issues/1 can now be used as the issue was resolved. So for now we know how to get new SBO terms for sboann. The next step to keep sboann up-to-date would be to determine a way to automatically map identifiers to the SBO terms. For EC numbers that might be easier than for other identifiers like BiGG IDs. However, even for EC numbers it would be important to establish a mapping rule, e.g. Use the first number in the EC number to assign the SBO term or something similar.

cb-Hades commented 1 year ago

Added a function in util.py that rewrites the well-annotated SBOterms into lower tier that memote accepts to "fix" the memote score.

Currently only "fixes" biochem reactions.

[ ] extend

draeger-lab / SBOannotator

Add more functionality to `SBOannotator` #1