RBVI / stringApp

Cytoscape interface to STRING and STITCH
BSD 2-Clause "Simplified" License
10 stars 11 forks source link

Confidence scores differ between networks & STRING excludes some edges above confidence threshold #13

Closed arilindsey closed 5 years ago

arilindsey commented 5 years ago

I'm working with STRINGApp v 1.4.0 implemented in cytoscape 3.6.0 on a Mac. I have sets of Drosophila melanogaster genes and am building networks at confidence level 0.6, with no additional interactors. I noticed inconsistencies in the connections between proteins in different networks and am trying to sort out why. Lists of gene names were imported via (import > network > public databases > STRING:protein query). Here is what I noticed about genes A,B,C in Networks #1 and #2.

Network #1 has ~250 genes, genes that are differentially expressed under condition X

Network #2 has ~450 genes, genes that are differentially expressed under condition Y

In Network #2, I change the confidence to 0.5, now new edges have been added! But, when I sort the edge table, the lowest score of all edges is still 0.616, and the gene A to gene B edge score has now jumped up to 0.830, and gene B to gene C now has a score of 0.989. Change the confidence threshold back up to 0.6, and no edges are removed, scores stay high.

I played with the scores in Network #1 and nothing like this happened. I re-loaded in Network #2, and now only edges with score >0.805 are included in the network, even though threshold is at 0.6.

I can't seem to find any discussion of something similar happening. At first I was worried that the score of an edge was being scaled concomitant with total node number, but that does not seem to be the case at all, given the inconsistencies within a network, not just between networks.

The genes that altered me to the issue in the first place are:

E(bx) [FBgn0000541] connection to Taf1 [FBgn0010355] — genes A and B in previous example. Taf1 [FBgn0010355] connection to Rpll140 [FBgn0262955] — genes B and C in previous example.

I’m sure there are other issues given the shifting threshold/score issues, I just have been using E(bx)-Taf1 and Taf1-Rpll140 as indicators. I noticed the problem b/c E(bx) is differentially spliced in analysis #1 (I color coded spliced genes differently) and it so happened to be sitting on the perimeter of the network.

Attached are the two gene lists that I used to construct networks.

Let me know if anything else would be useful. Happy to send .cys files/screenshots if needed.

20181213_network1_ARIL.txt 20181213_network2_ARIL.txt

scootermorris commented 5 years ago

This is now fixed.