almeidasilvaf / syntenet

An R package to infer and analyze synteny networks from protein sequences
https://almeidasilvaf.github.io/syntenet/
21 stars 6 forks source link

Warning in cluster_network() #7

Closed xiaoyezao closed 2 years ago

xiaoyezao commented 2 years ago

Dear developers,

I encountered this warning message in the cluster_network() run:

Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  restarting interrupted promise evaluation

Is this a serious problem? Can I go on with the results?

Thank you,

Tao

almeidasilvaf commented 2 years ago

Hi, @xiaoyezao

I've never seen this warning before, and I can't reproduce it. Could you show the code you ran to generate this warning?

What does the output of cluster_network() look like for your data?

This may sound silly, but have you tried restarting your R session and running cluster_network() again? It seems like a warning that happens randomly. Likewise, do you get the same message if you rerun cluster_network() in the same session?

xiaoyezao commented 2 years ago

I just followed the instructions, the code is simply clusters <- cluster_network(net). This finished smoothly other than the warning message, and the generated result is like this:

> head(clusters)
                                   Gene Cluster
1 Apiaceae_Apium_graveolens_Ag1G01158.1       1
2 Apiaceae_Apium_graveolens_Ag1G01159.1       2
3 Apiaceae_Apium_graveolens_Ag1G01160.1       3
4 Apiaceae_Apium_graveolens_Ag1G01165.1       4
5 Apiaceae_Apium_graveolens_Ag1G01168.1       5
6 Apiaceae_Apium_graveolens_Ag1G01179.1       6

I continued profiles <- phylogenomic_profile(clusters) with this result, but got the following error:

Error in stats::hclust(dist_mat, method = "ward.D") : 
  size cannot be NA nor exceed 65536

Any suggestions on this?

almeidasilvaf commented 2 years ago

Could you share your net and clusters objects so I can try to inspect this issue?

You can save them as an .rda file and push the file to a repo that I can access. Something like this:

save(clusters, net, file = "network_and_clusters.rda", compress = "xz")
xiaoyezao commented 2 years ago

please use this link to download the data https://drive.google.com/file/d/1Eajys70brfYKHw68mtt1O6YDGpaUqxRL/view?usp=sharing

Let me know if this doesn't work. Thank you!

almeidasilvaf commented 2 years ago

Hi, @xiaoyezao

I've just checked it now and there are some issues with your data:

library(tidyverse)

# Get number of clusters
clusters %>% count(Cluster) %>% nrow()

# Get number of clusters with 2 nodes only
clusters %>% count(Cluster) %>% filter(n == 2) %>% nrow()

Although this can be a real property of your data set (e.g., if you have 2 species that are very distantly related to all other species), I'd say it is likely a problem resulting from the fact that you didn't do the processing with process_input().

After running the whole pipeline properly, if you still find this huge amount of 2-node clusters, I'd suggest filtering your clusters prior to phylogenomic profiling like this:

clusters_to_keep <- clusters %>% count(Cluster) %>% filter(n > 2) %>% select(Cluster)
fclusters <- clusters[clusters$Cluster %in% clusters_to_keep$Cluster, ]
xiaoyezao commented 2 years ago

Thank you for your debugging!

I obtained the network using the shell version SynNet https://github.com/zhaotao1987/SynNet-Pipeline/wiki/SynNet-Build, and then feed the result to cluster_network() in R.

The huge 2-node clusters could be real because I have a few genomes from different plant orders which are quite far related. If I want to remove these few far-related genomes, can just remove these related clusters from the network? Or do I have to rerun from the very beginning?

BTW, I used long gene names because these data are also used in my other phylogenomic analyses, and I want to keep the "taxonomic information" of the genes. For me, process_input() is quite strict on the gene names, so I prepared the sequence andannotation data using a custom script following the rules of process_input() except that the gene names are processed differently.

I will filter the small clusters to see how it will be going

Thanks

almeidasilvaf commented 2 years ago

My pleasure to help!

Regarding your points:

I will close this issue. If you have any issues after running the complete pipeline (starting from the beginning), feel free to open a new issue here.

Thank you for using syntenet! ;)