PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

Unusual data structure for MSigDB (Transfac) in PC2v9 #284

Closed ozgunbabur closed 5 years ago

ozgunbabur commented 6 years ago

There are two problems with the MSigDB data in Pathway Commons v9.

1) TF --> Target relations do not have any sign in the MSigDB version of Transfac. They can be either positive or negative. PC assumes that they are all positive. It should be neutral instead. For an example look at the ChiBE view of some of the RB1 expressional downstream (below).

image

RB1 is a well-known transcriptional repressor. Most of its targets are expected to be repressed by RB1, but PC says they are all activated (look at the big chunk on the left). For a control, on the right, you can see the NCI PID data saying how RB1 inhibits MYC expression. It does that by binding and inhibiting E2F complexes. The controlType for the TemplateReactionRegulation objects of MSigDB all have to be just null instead of ACTIVATION.

This sign confusion in PC is dangerous for any algorithm that predicts differential transcription factor activities from target expressions. They will, for instance, predict that RB1 is activated, while it is inhibited. That just happened to me, hence, this issue.

2) MSigBD BioPAX reuses the same TemplateReactionRegulation object for every relation of a transcription factor. In the graph above all those green edges belong to the same TemplateReactionRegulation object with the ID of "http://pathwaycommons.org/pc2/TemplateReactionRegulation_631e22f8b2d78b8b396770a5751a91c6". BioPAX should not have allowed this. We should probably fix it in new BioPAX. Nevertheless, we shouldn't reuse Control objects in PC because it breaks graph excising. Let me clarify with an example. Assume the user queries paths from RB1 to MYC. Above graph is the current result of that query. All MSigDB targets of RB1 comes with the result. Why? Because this TemplateReactionRegulation is part of the result, and when the result set is converted into proper complete BioPAX, all of its controlled interactions has to come with it. Obvious solution here is to not to reuse any Control (or child) object. Any Control should have exactly one "controlled" process in its list.

IgorRodchenkov commented 6 years ago

Refs BioPAX/Paxtools#34