Open jasminhafner opened 2 months ago
Sounds good! I would contextualize though, that applying it to a matrix would send anyway as many requests as number of rules applied. We should definitely adapt the code to allow sending a list of smiles to each rule endpoint. It may be potentially necessary to add some list-length constraint to avoid excessively large requests to be sent.
Could you share the code how enviPath-python is used to generate that matrix?
Here's the function we use in pepper - we loop through all the rules in a package, and then we loop through all the input compounds. We have the same function as well for fetching highest reaction probabilities for a combination of rule/compound.
def get_rule_descriptors(self, list_of_rules):
"""
Applies each bt rules to all the SMILES in model data, and obtain a boolean matrix of triggered rules
:param list_of_rules: list of enviPath ParallelRule objects
"""
D = {} # D{'rule': {'ID' : value}, ...}
# iterate through list of rules (usually: BBD rules)
for rule in list_of_rules:
name = rule.get_name()
D[name] = {}
ids_list = []
smiles_list = []
# iterate through list of compounds
for index, row in self.model_data.iterrows():
ID = row['compound_id']
smiles = row['SMILES']
# apply rule to smiles
try:
out = rule.apply_to_smiles(smiles)
except:
print('Could not process SMILES:', smiles)
#break
if out == []:
value = 0
else:
value = 1
# collect results
D[name][ID] = value
ids_list.append(ID)
smiles_list.append(smiles)
For modelling, we use a matrix of triggered rules as descriptors. Right now this is done by iteratively applying the rule.apply_to_smiles() functions to a big matrix of of smiles x rules, which leads to a lot of server requests. It might be helpful to have a function that requests the whole matrix of rules applied to smiles from the server, and therefore reduce server traffic