glasgowcompbio / pyMultiOmics

Python toolbox for multi-omics data mapping and analysis
MIT License
19 stars 4 forks source link

Link entities from the output to the input #8

Closed joewandy closed 3 years ago

joewandy commented 3 years ago

When we run a set of queries, e.g.

res = QueryBuilder(ap) \
        .add(Select(GENES)) \
        .add(Connected(data_type=COMPOUNDS)) \
        .add(SignificantDE()) \
        .run()

it would be good to know which entities in the output are linked to the ones in the input. So in the example above, we would like to know which individual significantly changing compound is connected to a particular input gene (not the overall aggregated results, which is what's returned in the query set above).

One way to do this is to simply add an extra column in the final output showing which entities in the input produce that output (alternatively so as not to clutter the final output df, we could also produce this information as a separate dataframe).

Also worth considering, it might be useful to provide the entire network structure involved in the sets of queries, e.g. which genes are linked to which changing compounds through which proteins and reactions. It would be quite messy to display this as a dataframe? Maybe we could return a networkx graph? But that would need more querying to display the output in a presentable format on the web.