cytoscape / RCy3

New version of RCy3, redesigned and collaboratively maintained by Cytoscape developer community
MIT License
48 stars 20 forks source link

stochastic edge mixup: error checking needed #112

Closed m-grimes closed 3 years ago

m-grimes commented 3 years ago

When plotting networks using createNetworkFromDataFrames(nodes,edges) there is an error that comes up with increasing frequency that is dependent on the complexity of the node and edge data tables and network size. This error is reproducible even using the help file example:

nodes <- data.frame(id=c("node 0","node 1","node 2","node 3"), group=c("A","A","B","B"), # categorical strings score=as.integer(c(20,10,15,5))) # integers edges <- data.frame(source=c("node 0","node 0","node 0","node 2"), target=c("node 1","node 2","node 3","node 3"), interaction=c("inhibits","interacts", "activates","interacts"), # optional weight=c(5.1,3.0,5.2,9.9)) # numeric createNetworkFromDataFrames(nodes,edges)

If the last createNetworkFromDataFrames() command is repeated ten times there will be one or more cases where the edge "shared name" doesn't match the edge "name" or the nodes and edges it links. (It doesn't seem to matter if the network name is changed each time so the name isn't duplicated.) This becomes a problem when we map colors and widths to edges based on their "interaction" and "Weight" because they are wrongly assigned. For example, we make "phosphorylation" edges red and "positive correlation" yellow and if they are mixed up the network becomes meaningless. (Alex suggested working with edge SUIDs but I don't know how to do that.) The error rate in a test case we sent to Alex (tables pasted here) with 223 edges and 112 nodes varied but was typically 20-30%

The function below will get rid of bad networks. I suggest, until this problem is understood and solved, that an error checking step like this be incorporated into createNetworkFromDataFrames().

This error has been reproduced on three computers with the following info for two of them. 1: MacOS 11.2.1 Cytoscape Version: 3.8.2 Java: 11.0.8 by Oracle Corporation CyREST version 3.11.1 R version 4.0.3 (2020-10-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 10.16 RCy3_2.8.1 2: R Version 4.0.3 RStudio Version 1.4.1103 RCy3 Version 2.10.2 Cytoscape Version 3.8.1

Best wishes,

Mark

Workaround function for RCy3 glitches

delete.bad.networks <- function(){ pingtest=cytoscapePing() if (!grepl("connected", pingtest)) { print("You are NOT conncected to Cytoscape!") break } else { networks <- getNetworkList() for (i in 1:length(networks)) { edgeTable <- getTableColumns("edge", c("name", "shared name"), network=networks[i]) if(!identical(edgeTable[,1], edgeTable[,2])) { print(paste('Network', networks[i], "is bad.")) deleteNetwork(networks[i])} else { print (paste("Network", networks[i], "passes edge test.")) } } } }

edges_for_Cytoscape_test.txt nodes_for_Cytoscape_test.txt rcy3_test_networks.cys rcy3_test_script2.R edges_for_Cytoscape_test.txt nodes_for_Cytoscape_test.txt

m-grimes commented 3 years ago

This checking function is a workaround for createNetworkFromDataFrames:

createNetworkFromDataFrames.check <- function(nodes, edges, title="Checked", collection="Checked interactions") { for (i in 1:10){ createNetworkFromDataFrames(nodes, edges, title, collection) edgeTable <- getTableColumns("edge", c("name", "shared name")) if(!identical(edgeTable[,1], edgeTable[,2])) { print(paste('Network', i, "is bad.")) deleteNetwork()} else { print (paste("Network", i, "passes edge test.")) break } }} }

AlexanderPico commented 3 years ago

I appreciate the detailed bug report and the suggested fix! First, regarding your 2 computer setups. I notice that one is running RCy3 2.8.1. This one will certainly have problems like this. We introduced fixes into 2.10 specifically for sporadic concurrency issues. These issues are inherent to Cytoscape and CyREST, so they can't be fixed in RCy3. The "fix" was to introduce a delay to let the Cytoscape model catch up after certain functions. (No one can click this fast in the GUI, so we never saw this issue before.)

Running 2.10 or above, I ran your check function 100 times and it passed 100 times. So, I'm not able to reproduce the error.

You may, however, still see the problem with larger, more complex networks (as you noted). The necessary delay scales with the size of the data by some unknown function.

So, until the source is fixed in Cytoscape, we have to explore workarounds, as you've also suggested.

The first level workaround was to introduce these delays. In the next release, I'm thinking about making these delays user-adjustable for cases with large and complex networks, but that is not a very satisfying solution since the user has no idea if they need to adjust or not unless they take the time to do a bunch of testing like you did.

Your suggested workaround is interesting. I will have to ponder this with the team. In the meantime, I would encourage to you to perform the check in your own scripts as I'm sure you're already doing.

So, first make sure you are using RCy3 2.10 and keep an eye out for RCy3 2.12 in the next Bioconductor release at the end of April. Second, check the network properties you've seen get corrupted.

In parallel, I will keep trying to reproduce the error (which I haven't yet been able to do) and explore better workarounds until the problem is really fixed in Cytoscape.

AlexanderPico commented 3 years ago

Ah, I could reproduce it using the attached files! Thanks for those!

Hmm... This might be something else...

m-grimes commented 3 years ago

Alex

Thanks for doing tests on this issue. I see on your last message that with our files you could reproduce the error, likely due to more complex data being sent to Cytoscape.

I’m cc:ing my collaborator Karen to encourage her to update Cytoscape. My set up sessionInfo says I have RCy3_2.8.1, which is what Bioconductor thinks is the latest version. I haven’t downloaded GitHub newer versions (should do?). I check package status using the following.

if (!requireNamespace("BiocManager"))
  install.packages("BiocManager")
BiocManager::install()

And get the frequent packages-need-updating message, which is I hope keeping up with RCy3 updates properly.

Since we don’t understand this error, but do have a measure of the integrity of the edge assignments (which we depend on for graphing edge appearance), the simple check seems to work for now. This could be introduced transparently and cause random delays when things go wrong but this is better than graphing bad networks that misalign edges.

Best,

Mark

University of Montana

On 5 Mar 2021, at 16:54, Alexander Pico wrote:

Ah, I could reproduce it using the attached files! Thanks for those!

Hmm... This might be something else...

— I appreciate the detailed bug report and the suggested fix! First, regarding your 2 computer setups. I notice that one is running RCy3 2.8.1. This one will certainly have problems like this. We introduced fixes into 2.10 specifically for sporadic concurrency issues. These issues are inherent to Cytoscape and CyREST, so they can't be fixed in RCy3. The "fix" was to introduce a delay to let the Cytoscape model catch up after certain functions. (No one can click this fast in the GUI, so we never saw this issue before.)

Running 2.10 or above, I ran your check function 100 times and it passed 100 times. So, I'm not able to reproduce the error.

You may, however, still see the problem with larger, more complex networks (as you noted). The necessary delay scales with the size of the data by some unknown function.

So, until the source is fixed in Cytoscape, we have to explore workarounds, as you've also suggested.

The first level workaround was to introduce these delays. In the next release, I'm thinking about making these delays user-adjustable for cases with large and complex networks, but that is not a very satisfying solution since the user has no idea if they need to adjust or not unless they take the time to do a bunch of testing like you did.

Your suggested workaround is interesting. I will have to ponder this with the team. In the meantime, I would encourage to you to perform the check in your own scripts as I'm sure you're already doing.

So, first make sure you are using RCy3 2.10 and keep an eye out for RCy3 2.12 in the next Bioconductor release at the end of April. Second, check the network properties you've seen get corrupted.

In parallel, I will keep trying to reproduce the error (which I haven't yet been able to do) and explore better workarounds until the problem is really fixed in Cytoscape.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/cytoscape/RCy3/issues/112#issuecomment-791798120

AlexanderPico commented 3 years ago

Still working on troubleshooting, but definitely found an CyREST bug here that I think I can squash!

In the meantime, re: RCy3 version. You must have an older version of the BiocManager on that machine. See https://bioconductor.org/packages/release/bioc/html/RCy3.html and these upgrade tips https://www.bioconductor.org/install/.

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(version = "3.12")

AlexanderPico commented 3 years ago

Ok. I can load your network files repeatedly without errors now. If you want to try it out, you'll have to install it from github:

install.packages("devtools")
library(devtools)
install_github('cytoscape/RCy3', build_vignettes=TRUE)
#If installation fails due to package 'XXX' not found,
# then run install.packages("XXX") and then try install_github('cytoscape/RCy3') again
library(RCy3)

I know need to figure out how to simultaneously fix #41 while not messing this fix up. Once I have that figured out, I will push this fix to the current release version 2.10 at Bioconductor. But that will take a few days...

m-grimes commented 3 years ago

Alex

Thanks for your bug-squashing efforts. Unfortunately I’ve run into more problems, bigger bugs! I got a weird error after updating to RCy3_2.10.2:

Error: lazy-load database 

'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RCy3/R/RCy3.rdb' is corrupt In addition: Warning message: internal error -3 in R_decompress1 Error: lazy-load database '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RCy3/R/RCy3.rdb' is corrupt In addition: Warning messages: 1: restarting interrupted promise evaluation 2: internal error -3 in R_decompress1

Then after the command to install RCy3_2.11.4

> install_github('cytoscape/RCy3', build_vignettes=TRUE)
Downloading GitHub repo cytoscape/RCy3@HEAD
✓  checking for file 

‘/private/var/folders/pg/_s8g36d15z17dbfjyft1v6dr0000gn/T/RtmpSdpjNN/remotes81e10b6d8e5/cytoscape- RCy3-825db74/DESCRIPTION’ ... ─ preparing ‘RCy3’: ✓ checking DESCRIPTION meta-information ... ─ installing the package to build vignettes (341ms) ✓ creating vignettes (23.9s) ─ checking for LF line-endings in source and make files and shell scripts ─ checking for empty or unneeded directories ─ building ‘RCy3_2.11.4.tar.gz’

* installing *source* package ‘RCy3’ ...
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary 

installation path

Not sure what the yamlSeparators error means but there seems to be an issue as shown by the warnings below and the failure to create the graph from the demo in the help files.

> cytoscapePing()
[1] "You are connected to Cytoscape!"
There were 15 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In get(results[[i]], packages[[i]]) : internal error -3 in 

R_decompress1 2: In get(results[[i]], packages[[i]]) : internal error -3 in R_decompress1 3: In get(results[[i]], packages[[i]]) : internal error -3 in R_decompress1 4: In get(results[[i]], packages[[i]]) : internal error -3 in R_decompress1 5: In get(results[[i]], packages[[i]]) : restarting interrupted promise evaluation 6: In get(results[[i]], packages[[i]]) : internal error -3 in R_decompress1 7: In get(results[[i]], packages[[i]]) : restarting interrupted promise evaluation 8: In get(results[[i]], packages[[i]]) : internal error -3 in R_decompress1 9: In get(results[[i]], packages[[i]]) :
restarting interrupted promise evaluation 10: In get(results[[i]], packages[[i]]) : internal error -3 in R_decompress1 11: In get(results[[i]], packages[[i]]) : restarting interrupted promise evaluation 12: In get(results[[i]], packages[[i]]) : internal error -3 in R_decompress1 13: In mget(objectNames, envir = ns, inherits = TRUE) : internal error -3 in R_decompress1 14: In mget(objectNames, envir = ns, inherits = TRUE) : restarting interrupted promise evaluation 15: In mget(objectNames, envir = ns, inherits = TRUE) : internal error -3 in R_decompress1

This seems to have broken RCy3:

> createNetworkFromDataFrames(nodes,edges)
Error in createNetworkFromDataFrames(nodes, edges) :
  lazy-load database 

'/Library/Frameworks/R.framework/Versions/4.0/Resources/library/RCy3/R/RCy3.rdb' is corrupt In addition: Warning message: In createNetworkFromDataFrames(nodes, edges) : internal error -3 in R_decompress1

I’ll go back to the bio conductor version and see if that works.

Mark

University of Montana

On 5 Mar 2021, at 18:21, Alexander Pico wrote:

Ok. I can load your network files repeatedly without errors now. If you want to try it out, you'll have to install it from github:

install.packages("devtools")
library(devtools)
install_github('cytoscape/RCy3', build_vignettes=TRUE)
#If installation fails due to package 'XXX' not found,
# then run install.packages("XXX") and then try 
install_github('cytoscape/RCy3') again
library(RCy3)

I know need to figure out how to simultaneously fix #41 while not messing this fix up. Once I have that figured out, I will push this fix to the current release version 2.10 at Bioconductor. But that will take a few days...

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/cytoscape/RCy3/issues/112#issuecomment-791826985

AlexanderPico commented 3 years ago

Good news! What you are seeing is a common problem with R package updates. It has nothing to do with the new RCy3 code. Typically, you just need to restart R (and sometimes restart RStudio, if you're using that) then just try again. The error literally means that the file download failed.

https://stackoverflow.com/questions/59000627/how-do-i-solve-the-r-code-execution-error-on-rstudio

m-grimes commented 3 years ago

Good news indeed! I still get the error

Error in yamlSeparators[[1]] : subscript out of bounds

But the test network graphed, whew!

Thanks,

Mark

University of Montana

On 6 Mar 2021, at 12:22, Alexander Pico wrote:

Good news! What you are seeing is a common problem with R package updates. It has nothing to do with the new RCy3 code. Typically, you just need to restart R (and sometimes restart RStudio, if you're using that) then just try again. The error literally means that the file download failed.

https://stackoverflow.com/questions/59000627/how-do-i-solve-the-r-code-execution-error-on-rstudio

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/cytoscape/RCy3/issues/112#issuecomment-792032600

AlexanderPico commented 3 years ago

yaml is specific to the Travis CI system that automatically builds and checks the repo. Should only be relevant to developers, not users. Should be able to ignore it safely.

m-grimes commented 3 years ago

Thanks Alex. - Mark

University of Montana

On 6 Mar 2021, at 13:01, Alexander Pico wrote:

yaml is specific to the Travis CI system that automatically builds and checks the repo. Should only be relevant to developers, not users. Should be able to ignore it safely.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/cytoscape/RCy3/issues/112#issuecomment-792043083

AlexanderPico commented 3 years ago

@m-grimes FYI: The final version of the fixed functions has been submitted to Bioconductor dev branch as version 2.11.5. It should be availabe there in 2-3 days. Please use that one instead of github once it's available and let me know if you have any troubles. That will become the official release version in the next Bioconductor release in late April.

Thanks again for reporting this bug!

m-grimes commented 3 years ago

Will do, thanks!@ Mark

University of Montana

On 6 Mar 2021, at 16:30, Alexander Pico wrote:

@m-grimes FYI: The final version of the fixed functions has been submitted to Bioconductor dev branch as version 2.11.5. It should be availabe there in 2-3 days. Please use that one instead of github once it's available and let me know if you have any troubles. That will become the official release version in the next Bioconductor release in late April.

Thanks again for reporting this bug!

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/cytoscape/RCy3/issues/112#issuecomment-792116567