BaderLab / AutoAnnotateApp

The AutoAnnotate Cytoscape App finds clusters of nodes and visually annotates them with semantic labels and groups.
GNU Lesser General Public License v2.1
6 stars 1 forks source link

post analysis edges + annotations very buggy - loosing nodes from the main network, randomly creating duplicates of nodes #166

Open risserlin opened 3 years ago

risserlin commented 3 years ago

Take annotated network. collapse all clusters run Post analysis on the collapsed network so we can get the true post analysis edges. After running the PA nodes don't show up. Click PA set off and then back on again to see the nodes. --> Missing collapsed nodes and random nodes from some of collapsed set duplicated in left corner. Expand clusters to get missing clusters back and re-collapse -edges are now there but they are empty. + the appearance of random duplicates

risserlin commented 3 years ago

if you un-select the PA nodes that you created when the network was collapsed it removes the collapsed nodes as well as the PA nodes. Re introducing them only brings back the post analysis nodes and not the collapsed nodes. To get the nodes back you have to go back and expand all nodes.

risserlin commented 3 years ago

There is for sure a conflict between collapse/expand and hiding/unhiding post analysis

risserlin commented 3 years ago

I really need this to work. Is this a cytoscape group issue or an AA issue? Will moving to a previous version of AA or cytoscape fix this problem? (I tried using the summary network but that also had the same issue)

mikekucera commented 3 years ago

OK I'll make this my top priority.

risserlin commented 3 years ago

Thanks!

mikekucera commented 3 years ago

Hi Ruth, I've never tried running PA on a collapsed network so I'm not at all surprised it doesn't work properly.

Group nodes in Cytoscape are notorious for being a buggy mess. That's the reason we added the summary network function to AA. In fact I may not be able to fix this in EM/AA, its more likely problems with Cytoscape itself.

Could you please explain exactly what you are trying to accomplish, and the list of steps you are trying to take (assuming they actually worked properly). We may have to brainstorm an alternate solution that doesn't involve group nodes.

risserlin commented 3 years ago

I have a network with many different clusters and post analysis. I have also added additional annotations to the PA edges with the genes responsible for overlap. Here is an anonymized view of the network: anon_image

The large clusters don't really add anything to the view of the network and all those extra PA edges are redundant so I want to collapse the network (and in so doing collapse the PA edges as well).

What I have tried: Collapse all nodes - this is how it all started Collapse just the large nodes. - didn't fix anything Start with the network without the PA and try and do the post analysis on the collpased network - doesn't work because the collapsed nodes don't have the gene list so nothing to calculate the overlap with. Start with network with the PA and create a summary network and do post analysis on the summary network - can't. It isn't an EM so functionality not there and nodes and edges don't contain list of genes

That gets us into hiding and unhiding PA with collapse and expanded messing everything up because all of sudden I have missing node or extra nodes.

Maybe we can set up a meeting and I can show what my session is doing or if it is easier I can send it to you?

Thanks, Ruth

mikekucera commented 3 years ago

Send me your session file for now please.

risserlin commented 3 years ago

sent

risserlin commented 3 years ago

Hack using cyrest from R, manually collapse the attributes and resend them to Cytoscape.

current_baderlab_network <- setCurrentNetwork(params$network_name )

current_baderlab_nodetable <- getTableColumns(table="node")

current_baderlab_edgetable <- getTableColumns(table="edge")

#get the cluster numbers
all_clusters <- unique(current_baderlab_nodetable$'__mclCluster')
labels <- c()
genes <- list()
signature_edges <- list()

meta_nodes_info <- c()
meta_edges_info <- c()

for(i in 1:length(all_clusters)){
  if(!is.na(all_clusters[i])){

      current_nodes <- which(current_baderlab_nodetable$'__mclCluster' == all_clusters[i])
       aa_label_command = paste('autoannotate label-clusterBoosted labelColumn="',
                                colnames(current_baderlab_nodetable)[grep(pattern =                       "GS_DESCR",colnames(current_baderlab_nodetable))],'" nodeList=',
                           paste("SUID:",paste(current_baderlab_nodetable$SUID[current_nodes],collapse = ",SUID:"), sep=""),
                           sep="")

      #calculate the current label
      current_label <- commandsGET(aa_label_command)
      labels <- c(labels, current_label)

      #get all the genes for this cluster
     genes[i]<-list(unique(unlist(current_baderlab_nodetable$`EnrichmentMap::Genes`[current_nodes])))

     #for the set of nodes in this cluster get all the signature edges
     current_signature_edges <- c()
     for(j in 1:length(current_nodes)){
       current_signature_edges <- c(current_signature_edges,intersect(grep(current_baderlab_edgetable$name,
                                                           pattern=current_baderlab_nodetable$name[current_nodes[j]]),
                                                      which(current_baderlab_edgetable$interaction=="sig")))
     }

     signature_edges[i] <- list(current_signature_edges)

     #calculate the summary node stats - 
     nodes_set_to_collapse <- current_baderlab_nodetable[current_nodes,]
     meta_nodes_info <- rbind(meta_nodes_info, cbind( current_label,
                                                   min(nodes_set_to_collapse[,grep(colnames(nodes_set_to_collapse), pattern="pvalue")]),
                                                   min(nodes_set_to_collapse[,grep(colnames(nodes_set_to_collapse), pattern="fdr_qvalue")]),
                                                   max(nodes_set_to_collapse[,grep(colnames(nodes_set_to_collapse), pattern="NES")]),
                                                   paste(unique(unlist(nodes_set_to_collapse[,grep(colnames(nodes_set_to_collapse), pattern="EnrichmentMap::Genes")])),collapse = ",")))

     edge_subset <- current_baderlab_edgetable[unlist(signature_edges[i]),]

  if(dim(edge_subset)[1] > 0){
      nodeA <- apply(edge_subset,1,FUN=function(x){unlist(strsplit(x$name,split = " \\("))[1]})
      #we don't really care about the nodeB as they are all part of the cluster and the cluster is going to be collapsed to one node. 
      nodeB <- apply(edge_subset,1,FUN=function(x){unlist(strsplit(x$name,split = "\\) "))[2]})

      #get the unique PA nodes 
      unique_PA_nodes <- unique(nodeA)

      for(j in 1:length(unique_PA_nodes)){
        #get each edge that has this PA node
        set_to_collapse <- edge_subset[which(nodeA == unique_PA_nodes[j]),]

        #currently only interested in the overlapping genes and p-values so just collapse those
        # get the union of all the genes in the overlap
        # get the minimum p-value for mann_whit_greater
        # get the minimum p-value for mann_whit_less

        meta_edges_info <- rbind(meta_edges_info, cbind( unique_PA_nodes[j], current_label,
                                                   min(set_to_collapse[,grep(colnames(set_to_collapse), pattern="Overlap_Mann_Whit_greater_pVal")]),
                                                   min(set_to_collapse[,grep(colnames(set_to_collapse), pattern="Overlap_Mann_Whit_less_pVal")]),
                                                   paste(unique(unlist(set_to_collapse[,grep(colnames(set_to_collapse), pattern="Overlap_genes")])),collapse = ",")))
      }
  }

  }
  else{
    labels <- c(labels, "NA")
    genes[i] <- ""
    signature_edges[i] <- ""
  }
}

meta_edges <- data.frame(pa_node = meta_edges_info[,1], collapsed_node = meta_edges_info[,2],
                             as.numeric(meta_edges_info[,3]),as.numeric(meta_edges_info[,4]), meta_edges_info[,5])

colnames(meta_edges)[3:5] <- c(
                               colnames(current_baderlab_edgetable)[grep(colnames(current_baderlab_edgetable), pattern="Overlap_Mann_Whit_greater_pVal")],
                               colnames(current_baderlab_edgetable)[grep(colnames(current_baderlab_edgetable), pattern="Overlap_Mann_Whit_less_pVal")],
                               colnames(current_baderlab_edgetable)[grep(colnames(current_baderlab_edgetable), pattern="Overlap_genes")]
)

meta_edges$`shared name` <- paste(meta_edges$pa_node, " (meta) ", meta_edges$collapsed_node,sep="")

rownames(meta_edges) <- meta_edges$`shared name`

#make sure the overlap genes are a list.
 meta_edges$`EnrichmentMap::Overlap_genes` <- strsplit(meta_edges$`EnrichmentMap::Overlap_genes`,split = ",")

#collapse the network. 
collapse_command ='autoannotate collapse'
coll_response <- commandsGET(collapse_command ) # --> NOT WORKING (manually collapse network before doing the next command.  autoannotate collapse from the command line window in cytoscape does work though????)

loadTableData(data=meta_edges,data.key.column = "shared name",table = "edge")