BaderLab / AutoAnnotateApp

The AutoAnnotate Cytoscape App finds clusters of nodes and visually annotates them with semantic labels and groups.
GNU Lesser General Public License v2.1
6 stars 1 forks source link

Unable to build clustering annotation from the command line #207

Open ycaspi257 opened 1 week ago

ycaspi257 commented 1 week ago

Hello, I was trying to build an Autoannotate clustering from the command line using a command: autoannotate annotate-clusterBoosted clusterAlgorithm=MCL labelColumn=EnrichmentMap::GS_DESCR maxWords=3 network=current edgeWeightColumn=name

However, I get an error message: Cannot invoke "org.baderlab.autoannotate.internal.model.AnnotationSetBuilder.getClusters()" because "this.builder" is null

Clustering using the Cytoscape Autoannotate menu works just fine. Only the command line send the error message. In addition, if I increase the similaritycutoff of the network so that fewer edges are formed, clustering from the command line or the Cytoscape Autoannotate menu were perfectly well.

What can be the source of the problem?

Best, Yaron Caspi

mikekucera commented 1 week ago

What version of AutoAnnotate are you using?

Can you please send me your framework-cytoscape.log file found in the <user-home>/CytoscapeConfiguraiton/3 folder. That should contain the entire exception trace. And if possible please send me your session file.

Thanks!

ycaspi257 commented 1 week ago

Dear Mike,

Thank you very much for your prompt reply.

The files you requested are attached.

I am using Autoannotate V.1.4.1 with Cytoscape 3.10.2 Java 10.0.12 on Ubuntu 20.04.

You can see the problem, e.g., in the network "Left_Hemisphere_fMRI_NQ-EF". The command I was using is:

autoannotate annotate-clusterBoosted clusterAlgorithm=MCL labelColumn=EnrichmentMap::GS_DESCR maxWords=3 network=current

Waiting forward for your further help.

Best, Yaron Caspi

BTW, it was very hard, or even impossible, to find in the documentation the appropriate value for the clusterAlgorithm to put in the command instead of MCL

On 06/09/2024 00:09, Mike Kucera wrote:

What version of AutoAnnotate are you using?

Can you please send me your framework-cytoscape.log file found in the /CytoscapeConfiguraiton/3 folder. That should contain the entire exception trace. And if possible please send me your session file.

Thanks!

— Reply to this email directly, view it on GitHubhttps://github.com/BaderLab/AutoAnnotateApp/issues/207#issuecomment-2332129031, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BLBDGVAI7KSWIXOFDKN3UWLZVB6Z3AVCNFSM6AAAAABNVUGW5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZSGEZDSMBTGE. You are receiving this because you authored the thread.Message ID: @.***>

mikekucera commented 1 week ago

Hi, It looks like GitHub didn't attach your files. Can you please send them to me directly at mikekucera@gmail.com. Thanks.

mikekucera commented 4 days ago

Hi, there are two things that should help here... 1) Try updating AutoAnnotate to the latest version (currently 1.5.1). I don't get the same error with the latest version. 2) You must use a numeric column for the edgeWeightColumn attribute. Using the 'name' column, which has type String, causes an error in clusterMaker. Try edgeWeightColumn=EnrichmentMap::similarity_coefficient

ycaspi257 commented 3 days ago

Dear Mike,

Thank you so much. After updating to version 1.5.1, it indeed seems to work.

Two more unrelated questions.

A. Is there a simple command to get the list of clustered and number of nodes they include (like the menu item used to export cluster to file)? B. Is there a way to add words to the "excluded words" list definitely. I mean, is there a file or something similar that I can edit to add several words definitely?

Best, Yaron

On 11/09/2024 22:18, Mike Kucera wrote:

Hi, there are two things that should help here...

  1. Try updating AutoAnnotate to the latest version (currently 1.5.1). I don't get the same error with the latest version.
  2. You must use a numeric column for the edgeWeightColumn attribute. Using the 'name' column, which has type String, causes an error in clusterMaker. Try edgeWeightColumn=EnrichmentMap::similarity_coefficient

— Reply to this email directly, view it on GitHubhttps://github.com/BaderLab/AutoAnnotateApp/issues/207#issuecomment-2343808905, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BLBDGVE4WMYC4JDSNR7UWTLZWBGLBAVCNFSM6AAAAABNVUGW5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBTHAYDQOJQGU. You are receiving this because you authored the thread.Message ID: @.***>

risserlin commented 3 days ago

Hi Yaron, I know you are running commands but are you running this through R or python?

If you are running commands thought R or python, with regards to you first question, there isn't a simple command to get the info but what I usually do is after autoannotating the network I get the node table (I use RCy3 from R and use the function - getTableColumns) default_node_table <- getTableColumns(table= "node",network = network_suid)

with that table you can use the column __mclCluster to get the number of nodes in the cluster and their names.

  1. With regards to adding words to the exclusion list permanently, In word cloud there is a mechanism to add words to the list and I believe that it gets stored and reloaded but I prefer to run the following command prior to annotating: wordcloud ignore add value="wordtoignore"network=SUID:1234

Imbedded in one of my R workflows I have:

add the set of words to ignore

words2ignore <- c("pid",1:10) responses <- lapply(words2ignore,function(x){ wordcloud2_url <- paste("wordcloud ignore add value=\"",x, "\" ","network=SUID:",network_suid, sep=""); commandsGET(wordcloud2_url)})

Thanks, Ruth

ycaspi257 commented 3 days ago

Dear Ruth,

Thank you so much.

I use R.

When doing it manually (at least for autoannotate), I did not find a mechanism to gets it stored. This is why I thought that there might be an excluded words file somewhere that I can just edit.

I was mainly interested in adding excluded words to the autoannotate clustering algorithm and not word cloud (to get the cluster labeling to fit my purposes).

Thank again.

Best, Yaron

On 12/09/2024 20:54, Ruth Isserlin wrote:

Hi Yaron, I know you are running commands but are you running this through R or python?

If you are running commands thought R or python, with regards to you first question, there isn't a simple command to get the info but what I usually do is after autoannotating the network I get the node table (I use RCy3 from R and use the function - getTableColumns) default_node_table <- getTableColumns(table= "node",network = network_suid)

with that table you can use the column __mclCluster to get the number of nodes in the cluster and their names.

  1. With regards to adding words to the exclusion list permanently, In word cloud there is a mechanism to add words to the list and I believe that it gets stored and reloaded but I prefer to run the following command prior to annotating: wordcloud ignore add value="wordtoignore"network=SUID:1234

Imbedded in one of my R workflows I have:

add the set of words to ignore

words2ignore <- c("pid",1:10) responses <- lapply(words2ignore,function(x){ wordcloud2_url <- paste("wordcloud ignore add value="",x, "" ","network=SUID:",network_suid, sep=""); commandsGET(wordcloud2_url)})

Thanks, Ruth

— Reply to this email directly, view it on GitHubhttps://github.com/BaderLab/AutoAnnotateApp/issues/207#issuecomment-2346208340, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BLBDGVFCTMSPXXX5ANGWOKDZWGFJLAVCNFSM6AAAAABNVUGW5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBWGIYDQMZUGA. You are receiving this because you authored the thread.Message ID: @.***>

risserlin commented 3 days ago

Hi Yaron, Autoannotate uses wordcloud to compute the labels so if you want to exclude words you have to make the change in word cloud.
There is a file in the WordCloud jar (which you can find in your CytoscapeConfiguration/3/apps/installed directory) called FlaggedWords.txt that you can add words to.

You would need to run the following commands to do it. (This is very hacky, sorry)

mv WordCloud-v3.1.4.jar WordCloud-v3.1.4.zip

create a FlaggedWords.txt file which looks like this: kegg reactome react biocarta go nci msigdb my_new_word1 my_new_word2

And then run: zip -u WordCloud-v3.1.4.zip FlaggedWords.txt

mv WordCloud-v3.1.4.zip WordCloud-v3.1.4.jar

Alternately, depending on the words, you can ask @mikekucera to add the words to distribution but often words can be very specific to the dataset or data sources you are using so we try to avoid that.

Thanks, Ruth

ycaspi257 commented 3 days ago

Dear Ruth,

Thank again. I will follow these instructions.

I was mainly referring to dataset pathway name from gene ontology, namely, GOCC, GOMF and GOBP. When working with GSEA - GSEA add these to the node names. Hence, when doing the clustering, there is a bias toward these words in the cluster name.

It might be reasonable to exclude these words (or give an option to exclude those and similar words that GSEA adds) in future distributions, since they are relatively general and not specific.

Best, Yaron

On 12/09/2024 21:26, Ruth Isserlin wrote:

Hi Yaron, Autoannotate uses wordcloud to compute the labels so if you want to exclude words you have to make the change in word cloud. There is a file in the WordCloud jar (which you can find in your CytoscapeConfiguration/3/apps/installed directory) called FlaggedWords.txt that you can add words to.

You would need to run the following commands to do it. (This is very hacky, sorry)

mv WordCloud-v3.1.4.jar WordCloud-v3.1.4.zip

create a FlaggedWords.txt file which looks like this: kegg reactome react biocarta go nci msigdb my_new_word1 my_new_word2

And then run: zip -u WordCloud-v3.1.4.zip FlaggedWords.txt

mv WordCloud-v3.1.4.zip WordCloud-v3.1.4.jar

Alternately, depending on the words, you can ask @mikekucerahttps://github.com/mikekucera to add the words to distribution but often words can be very specific to the dataset or data sources you are using so we try to avoid that.

Thanks, Ruth

— Reply to this email directly, view it on GitHubhttps://github.com/BaderLab/AutoAnnotateApp/issues/207#issuecomment-2346280015, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BLBDGVGT5CDX3X7ZHP34QX3ZWGI6TAVCNFSM6AAAAABNVUGW5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBWGI4DAMBRGU. You are receiving this because you authored the thread.Message ID: @.***>

risserlin commented 3 days ago

Hi Yaron, Which geneset files are you using? Are you using the one supplied by GSEA? (word cloud weights the words based on occurrence in the network so if GOBP and GOMF are everywhere they shouldn't be coming up in the cluster tag). I don't see them coming up in my networks but I use the baderlab genesets and not the ones supplied with GSEA so I am curious if there is an issue. Thanks, Ruth

mikekucera commented 3 days ago

There is no global list of excluded words you can edit. The only way to do it is to modify the default list of words stored in the app jar like Ruth suggested. Excluded words are saved in the session file and can only be set on a per-network basis. If you are using R then they easiest thing to do is have a series of commands of the form wordcloud ignore add value="wordtoignore" network=current in your script before the command to create the annotations.

ycaspi257 commented 2 days ago

Dear Ruth,

I am using C5.all.v2024.1.Hs.symbols.gmt, which is distributed with GSEA. That results in EnrichmentMap GS_DESCR mode names like https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/GOBP_ELECTRON_TRANSPORT_CHAIN and Enrichment map node names like GOBP_ELECTRON_TRANSPORT_CHAIN. And this is then taken by autoannotate to include labels that include words such as GOBP ...

Naturally, this can be removed by a python/R scripts. But working manually is cumbersome.

Best, Yaron

On 9/12/24 21:46, Ruth Isserlin wrote:

Hi Yaron, Which geneset files are you using? Are you using the one supplied by GSEA? (word cloud weights the words based on occurrence in the network so if GOBP and GOMF are everywhere they shouldn't be coming up in the cluster tag). I don't see them coming up in my networks but I use the baderlab genesets and not the ones supplied with GSEA so I am curious if there is an issue. Thanks, Ruth

— Reply to this email directly, view it on GitHubhttps://github.com/BaderLab/AutoAnnotateApp/issues/207#issuecomment-2346334012, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BLBDGVFVEFBOIQCK7OMHMV3ZWGLLJAVCNFSM6AAAAABNVUGW5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBWGMZTIMBRGI. You are receiving this because you authored the thread.Message ID: @.***>

risserlin commented 2 days ago

Hi Yaron, Ok that makes sense. I forgot that is the way GSEA structures their gmt file. EM and AA are optimized for our gmt files which structures the name and description a little differently. I would recommend switching to them if you can. They are updated on a monthly basis so they are more up to date than the ones released by GSEA - https://download.baderlab.org/EM_Genesets/current_release/ - (info here - https://baderlab.org/GeneSets) Only caveat is they are only available for Human, Mouse, Rat and Woodchuck. Thanks, Ruth