enviPath / enviPath-python

MIT License
2 stars 2 forks source link

Matrix calculation of triggered rules #12

Open jasminhafner opened 2 months ago

jasminhafner commented 2 months ago

For modelling, we use a matrix of triggered rules as descriptors. Right now this is done by iteratively applying the rule.apply_to_smiles() functions to a big matrix of of smiles x rules, which leads to a lot of server requests. It might be helpful to have a function that requests the whole matrix of rules applied to smiles from the server, and therefore reduce server traffic

anguera5 commented 2 months ago

Sounds good! I would contextualize though, that applying it to a matrix would send anyway as many requests as number of rules applied. We should definitely adapt the code to allow sending a list of smiles to each rule endpoint. It may be potentially necessary to add some list-length constraint to avoid excessively large requests to be sent.

lorsbach commented 1 month ago

Could you share the code how enviPath-python is used to generate that matrix?

jasminhafner commented 1 month ago

Here's the function we use in pepper - we loop through all the rules in a package, and then we loop through all the input compounds. We have the same function as well for fetching highest reaction probabilities for a combination of rule/compound.

 def get_rule_descriptors(self, list_of_rules):
        """
        Applies each bt rules to all the SMILES in model data, and obtain a boolean matrix of triggered rules
        :param list_of_rules: list of enviPath ParallelRule objects
        """
        D = {}  # D{'rule': {'ID' : value}, ...}

        # iterate through list of rules (usually: BBD rules)
        for rule in list_of_rules:

            name = rule.get_name()

            D[name] = {}
            ids_list = []
            smiles_list = []
            # iterate through list of compounds 
            for index, row in self.model_data.iterrows(): 
                ID = row['compound_id']
                smiles = row['SMILES']

                # apply rule to smiles
                try:
                    out = rule.apply_to_smiles(smiles)
                except:
                    print('Could not process SMILES:', smiles)
                    #break
                if out == []:
                    value = 0
                else:
                    value = 1

                # collect results
                D[name][ID] = value
                ids_list.append(ID)
                smiles_list.append(smiles)