OceaneCsn / DIANE

Dashboard for the Inference and Analysis of Networks from Expression data
GNU General Public License v3.0
14 stars 4 forks source link

About the threshold settings > Edges statistical testing "ON" at the Network inference tab #38

Closed masa-yoshizawa closed 1 year ago

masa-yoshizawa commented 2 years ago

Dear Dr Cassan and authors. This is the wonderful software and user interface that I have been looking for. This is a neat suite for differentially-expressed gene analysis yet I have an issue on the use. I tried local (MacOS 10.14.6, with R 4.1.1, and Rstudio with R4.1.1) at MacBook pro, 15-inch, 2018, Core i7, 16G mem) and web version (https://diane.bpmp.inrae.fr/). However, DIANE always crushes at 'Edges statistical testing "ON"'. It worked well with hard thresholding, and the following network analysis. I would like to test try by step-by-step in R but I could not see the details of the R commands particularly how the DIANE makes the 'list' dataset when it incorporates the count data, gene information, and the experimental condition (and GO term files in later). If you can instruct me what might cause the crush at the Edges statistical testing, and how I can test locally by building the list datafile for DIANE, I would much appreciate it. Thank you! Again, yours is a super wonderful analysis suite. Best regards, Masato

OceaneCsn commented 2 years ago

Dear Dr Yoshizawa,

Thank you very much for your kind feedback about DIANE, and for reaching out with this question. Network thresholding via statistical testing is one of the most specific features of our suite and user feedbacks about it is very rare, so I would be glad to go to the bottom of this issue with you.

If you can instruct me what might cause the crush at the Edges statistical testing

To help you the best I can, I will start with some questions :

how I can test locally by building the list datafile for DIANE

To use DIANE's functions in a script on your data, you will find useful information in this vignette : https://oceanecsn.github.io/DIANE/articles/DIANE_Programming_Interface.html

More precisely, the data needed to use DIANE's functions is :

To see how those dataframes should look like, you can inspect the counts and annotations of the demo data of DIANE as follows :

library(DIANE)
# counts
data("abiotic_stresses")
abiotic_stresses$raw_counts
# annotation
data("gene_annotations")
gene_annotations$`Arabidopsis thaliana`

To create a dataframe from your expression csv or txt file, you can store it in a variable in R with counts <- read.csv(pathToYourExpressionFile, sep = ',', row.names = "Gene", header = TRUE, stringsAsFactors = FALSE, check.names = FALSE) (with the correct path and separator). Then, use your counts instead of abiotic_stresses$counts as an argument in DIANE's functions in the rest of the vignette for normalisation, differential expression, and so on. The same applies for gene annotations, that can be imported with read.csv and stored in a dataframe with gene IDs as rownames, and columns named "label" and/or "description".

The reference lists DIANE's functions and their required arguments in details.


To conclude, I think it would be easier if I had access to your data, as it would tell me if there is an internal problem in the thresholding function that only a developer can solve (Indeed, even if your try a step by step approach in a script, it will probably crash when calling the function test_edges without being more informative than the user interface crash. Also, depending on your proficiency in R, this step by step approach could also be time consuming).

I hope this helps, Best regards,

Océane