Closed a-r-j closed 3 years ago
I've added support in process_dataframe
to provide lists of functions to process the atoms
and hetatms
dfs. If these are provided, they will do all the processing. If they are not, the default workflow will execute.
I decided to leave the default workflow in place for now. This way I think it's useful for high-level users as it makes the config more apparent, instead of them having to correctly partial a bunch of functions and sequence them. This would remove a lot of the oversight that the config object provides.
I found it tricky to elegantly refactor the functions that operate on sequences to work with both protein graphs and ppi graphs. I settled on this: what do you think?
def molecular_weight(input, seq_type="protein"):
from Bio import SeqUtils
func = partial(SeqUtils.molecular_weight, seq_type=seq_type)
# If a graph is provided, e.g. from a protein graph we compute the function over the chains
if isinstance(input, nx.Graph):
G = compute_feature_over_chains(
input, func, feature_name="molecular_weight"
)
return G
# If a node is provided, e.g. from a PPI graph we extract the sequence and compute the weight
elif type(input) == str:
return func(input)
Did some refactoring to the protein graph construction.
I cleaned up the high-level graph construction function and refactored some of
process_dataframe
such that the various steps are their own functions.There are a couple things I'd really appreciate your take on, @ericmjl :
Refactoring the dataframe processing to support users providing a list of functions that operate on atom/hetatom dataframes in a manner similar to the metatdata annotation family of functions.
Having high-level functions that can be used with just a config object, but also have additional optional arguments that override the config. At the moment, I've partially done this in
construc_graph
for the metadata funcs. The flow is to load a default config if none is provided and then optionally overwrite the function list parameters if they are provided. Do you think it makes sense to have this available for all the config parameters? It would blow up the number of arguments thatconstruct_graph
would take, but they would all be optional.