fogellab / multiWGCNA

an R package for deep mining gene co-expression networks in multi-trait expression data
13 stars 2 forks source link

Vignettes #3

Closed GettyScience closed 11 months ago

GettyScience commented 11 months ago

Hello,

I am trying to run through your vignette on autism brain samples and am unable to load the autism_se data. I get a error at

"autism_se = eh_query[["EH8219"]]".

The error message is:

"Error: EH8219 added after current Hub snapshot date. added: 2023-05-15 snapshote date: 2023-04-24"

How do you recommend getting around this issue? I would like to know how the data is supposed to be set up within the data.frame so that I may add my data.

We have expression data from plants in two genomes and under vertical or gravity stimulation.

Thank you,

dariotommasini commented 11 months ago

Hi,

Thank you for your interest in multiWGCNA. You're absolutely right to follow the vignettes to check the format of the input data.

For the error, it appears to be an issue with your ExperimentHub version. I have solved this in the past by updating to the latest version of Bioconductor and reinstalling ExperimentHub. You may need to update R/RStudio in order to update Bioconductor.

To answer your other question about the input data, the input data can be either a SummarizedExperiment object (it will use the first assay in the object so make sure this is your expression data). It can also be a data.frame with genes as rows and samples as columns. Make sure the sample names match those in the sample table.

GettyScience commented 11 months ago

Thank you for getting back to me so quickly.

Will my data.frame include the genes as rows and all samples of interest (regardless of treatment, named accordingly) as columns? What will the Sample Table look like? Will it be samples as rows and sample type as columns?

GettyScience commented 11 months ago

I have checked all my versions and everything on my end is up to date with the error not resolved.

dariotommasini commented 11 months ago

Interesting. What is your snapshot date for ExperimentHub when you do:

library(ExperimentHub)
eh = ExperimentHub()

I get a snapshot date of 2023-07-18, which is after when multiWGCNAdata was added to ExperimentHub on May 20th, 2023.

Right, the datExpr input will need to be data.frame with genes as rows and samples as columns. The sampleTable will need to be a data.frame with the first column being the samples (ie columns from datExpr), and then should have two other columns with your variables of interest (ie disease and region). The rows of this sampleTable do not need to be in any particular order.

GettyScience commented 11 months ago

Hello, I have hit a new roadblock. In trying to used my data with the constructNetworks() function, I get the error:

Error in constructNetworks(MWGCNA_Data, SamplesTable, conditions1, conditions2, : inherits(datExpr, "SummarizedExperiment") | inherits(datExpr, .... is not TRUE

I have the data.frame set up as you described: genes | sample1 | sample 2 | sample 3| ... sample 16 g1 g2 g3 g4 ... g27000

The sampleTable is set up as: samples | genotype | gravity sample 1 | pgm | vertical sample 2 | pgm | vertical sample 3 | pgm | vertical ... sample 16 | col | treated

Would the error have anything to do with the fact the data.frames are imported as excel files?

Thank you,

dariotommasini commented 11 months ago

Sorry, I found out about this bug only today. Please re-install multiWGCNA:

devtools::install_github("fogellab/multiWGCNA", force = TRUE)

That should fix it.

I do encourage you to go through the vignettes, so try to get those files from ExperimentHub if you can!

GettyScience commented 11 months ago

Is 2 options okay for each treatment? I have a new error:

Error in if (x == trait) { : the condition has length > 1

GettyScience commented 11 months ago

Hello, I am again checking on this error. Thank you,

dariotommasini commented 11 months ago

Hi again!

Let's tackle one error at a time. I have a feeling solving the first one will help us with this new error. Can you try these lines and show me the message?

library(ExperimentHub)
eh = ExperimentHub()

While we're at it, can you also print your full sampleTable for me and paste it here?

Lastly, please show me your sessionInfo, with the ExperimentHub package loaded of course. Like this:

> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] multiWGCNA_0.99.2           ggalluvial_0.12.5           ggplot2_3.4.2               SummarizedExperiment_1.31.1 Biobase_2.61.0              GenomicRanges_1.53.1       
 [7] GenomeInfoDb_1.37.1         IRanges_2.35.1              S4Vectors_0.39.1            MatrixGenerics_1.13.0       matrixStats_0.63.0          multiWGCNAdata_0.99.1      
[13] ExperimentHub_2.9.0         AnnotationHub_3.9.1         BiocFileCache_2.9.0         dbplyr_2.3.2                BiocGenerics_0.47.0        

loaded via a namespace (and not attached):
  [1] rstudioapi_0.14               magrittr_2.0.3                rmarkdown_2.21                zlibbioc_1.47.0               vctrs_0.6.2                  
  [6] memoise_2.0.1                 RCurl_1.98-1.12               base64enc_0.1-3               htmltools_0.5.5               S4Arrays_1.1.4               
 [11] dynamicTreeCut_1.63-1         curl_5.0.0                    SparseArray_1.1.6             Formula_1.2-5                 htmlwidgets_1.6.2            
 [16] plyr_1.8.8                    impute_1.75.1                 cachem_1.0.8                  igraph_1.4.3                  mime_0.12                    
 [21] lifecycle_1.0.3               iterators_1.0.14              pkgconfig_2.0.3               Matrix_1.5-4.1                R6_2.5.1                     
 [26] fastmap_1.1.1                 GenomeInfoDbData_1.2.10       shiny_1.7.4                   digest_0.6.31                 colorspace_2.1-0             
 [31] patchwork_1.1.2               AnnotationDbi_1.63.1          Hmisc_5.1-0                   RSQLite_2.3.1                 vegan_2.6-4                  
 [36] filelock_1.0.2                fansi_1.0.4                   httr_1.4.6                    mgcv_1.8-42                   compiler_4.3.0               
 [41] rngtools_1.5.2                bit64_4.0.5                   withr_2.5.0                   doParallel_1.0.17             htmlTable_2.4.1              
 [46] backports_1.4.1               DBI_1.1.3                     MASS_7.3-60                   rappdirs_0.3.3                DelayedArray_0.27.3          
 [51] permute_0.9-7                 flashClust_1.01-2             tools_4.3.0                   foreign_0.8-84                interactiveDisplayBase_1.39.0
 [56] httpuv_1.6.11                 nnet_7.3-19                   glue_1.6.2                    nlme_3.1-162                  promises_1.2.0.1             
 [61] grid_4.3.0                    checkmate_2.2.0               cluster_2.1.4                 reshape2_1.4.4                generics_0.1.3               
 [66] gtable_0.3.3                  tzdb_0.4.0                    preprocessCore_1.63.1         hms_1.1.3                     data.table_1.14.8            
 [71] WGCNA_1.72-1                  utf8_1.2.3                    XVector_0.41.1                ggrepel_0.9.3                 BiocVersion_3.18.0           
 [76] foreach_1.5.2                 pillar_1.9.0                  stringr_1.5.0                 later_1.3.1                   splines_4.3.0                
 [81] dplyr_1.1.2                   lattice_0.21-8                survival_3.5-5                bit_4.0.5                     tidyselect_1.2.0             
 [86] GO.db_3.17.0                  Biostrings_2.69.1             knitr_1.42                    gridExtra_2.3                 xfun_0.39                    
 [91] stringi_1.7.12                yaml_2.3.7                    evaluate_0.21                 codetools_0.2-19              tibble_3.2.1                 
 [96] BiocManager_1.30.20           cli_3.6.1                     rpart_4.1.19                  xtable_1.8-4                  munsell_0.5.0                
[101] Rcpp_1.0.10                   png_0.1-8                     fastcluster_1.2.3             parallel_4.3.0                ellipsis_0.3.2               
[106] readr_2.1.4                   blob_1.2.4                    dcanr_1.17.0                  doRNG_1.8.6                   bitops_1.0-7                 
[111] scales_1.2.1                  purrr_1.0.1                   crayon_1.5.2                  rlang_1.1.1                   cowplot_1.1.1                
[116] KEGGREST_1.41.0  

Thanks!

GettyScience commented 11 months ago
Screenshot 2023-07-21 at 8 19 57 PM
COLTRT_1 col treatment    
COLTRT_2 col treatment    
COLTRT_3 col treatment    
COLTRT_4 col treatment    
COLVRT_1 col vertical    
COLVRT_2 col vertical    
COLVRT_3 col vertical    
COLVRT_4 col vertical    
PGMTRT_1 pgm treatment    
PGMTRT_2 pgm treatment
PGMTRT_3 pgm treatment    
PGMTRT_4 pgm treatment    
PGMVRT_1 pgm vertical    
PGMVRT_2 pgm vertical    
PGMVRT_3 pgm vertical    
PGMVRT_4 pgm vertical

R version 4.3.1 (2023-06-16) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Ventura 13.4.1

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: US/Pacific tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ExperimentHub_2.8.1 AnnotationHub_3.8.0 BiocFileCache_2.8.0 dbplyr_2.3.3
[5] BiocGenerics_0.46.0 BiocManager_1.30.21.1 multiWGCNA_0.99.2 ggalluvial_0.12.5
[9] ggplot2_3.4.2

loaded via a namespace (and not attached): [1] rstudioapi_0.15.0 magrittr_2.0.3 rmarkdown_2.23
[4] fs_1.6.3 zlibbioc_1.46.0 vctrs_0.6.3
[7] memoise_2.0.1 RCurl_1.98-1.12 base64enc_0.1-3
[10] htmltools_0.5.5 S4Arrays_1.0.4 usethis_2.2.2
[13] curl_5.0.1 dynamicTreeCut_1.63-1 Formula_1.2-5
[16] htmlwidgets_1.6.2 impute_1.74.1 cachem_1.0.8
[19] igraph_1.5.0 mime_0.12 lifecycle_1.0.3
[22] iterators_1.0.14 pkgconfig_2.0.3 Matrix_1.6-0
[25] R6_2.5.1 fastmap_1.1.1 GenomeInfoDbData_1.2.10
[28] MatrixGenerics_1.12.2 shiny_1.7.4.1 digest_0.6.33
[31] colorspace_2.1-0 patchwork_1.1.2 AnnotationDbi_1.62.2
[34] S4Vectors_0.38.1 ps_1.7.5 pkgload_1.3.2.1
[37] Hmisc_5.1-0 GenomicRanges_1.52.0 RSQLite_2.3.1
[40] filelock_1.0.2 fansi_1.0.4 httr_1.4.6
[43] compiler_4.3.1 rngtools_1.5.2 remotes_2.4.2.1
[46] bit64_4.0.5 withr_2.5.0 doParallel_1.0.17
[49] htmlTable_2.4.1 backports_1.4.1 DBI_1.1.3
[52] pkgbuild_1.4.2 rappdirs_0.3.3 DelayedArray_0.26.6
[55] sessioninfo_1.2.2 flashClust_1.01-2 tools_4.3.1
[58] foreign_0.8-84 interactiveDisplayBase_1.38.0 httpuv_1.6.11
[61] nnet_7.3-19 glue_1.6.2 callr_3.7.3
[64] promises_1.2.0.1 grid_4.3.1 checkmate_2.2.0
[67] cluster_2.1.4 generics_0.1.3 gtable_0.3.3
[70] tzdb_0.4.0 preprocessCore_1.62.1 data.table_1.14.8
[73] hms_1.1.3 WGCNA_1.72-1 utf8_1.2.3
[76] XVector_0.40.0 BiocVersion_3.17.1 ggrepel_0.9.3
[79] foreach_1.5.2 pillar_1.9.0 stringr_1.5.0
[82] later_1.3.1 splines_4.3.1 dplyr_1.1.2
[85] lattice_0.21-8 survival_3.5-5 bit_4.0.5
[88] tidyselect_1.2.0 GO.db_3.17.0 Biostrings_2.68.1
[91] miniUI_0.1.1.1 knitr_1.43 gridExtra_2.3
[94] IRanges_2.34.1 SummarizedExperiment_1.30.2 stats4_4.3.1
[97] xfun_0.39 Biobase_2.60.0 devtools_2.4.5
[100] matrixStats_1.0.0 stringi_1.7.12 yaml_2.3.7
[103] evaluate_0.21 codetools_0.2-19 tibble_3.2.1
[106] cli_3.6.1 rpart_4.1.19 xtable_1.8-4
[109] munsell_0.5.0 processx_3.8.2 Rcpp_1.0.11
[112] GenomeInfoDb_1.36.1 png_0.1-8 fastcluster_1.2.3
[115] parallel_4.3.1 ellipsis_0.3.2 readr_2.1.4
[118] blob_1.2.4 prettyunits_1.1.1 dcanr_1.16.0
[121] doRNG_1.8.6 profvis_0.3.8 urlchecker_1.0.1
[124] bitops_1.0-7 scales_1.2.1 purrr_1.0.1
[127] crayon_1.5.2 rlang_1.1.1 cowplot_1.1.1
[130] KEGGREST_1.40.0

dariotommasini commented 11 months ago

Okay, this should be easy. You need a newer version of ExperimentHub (>2.9.0):

BiocManager::install("ExperimentHub", force = TRUE)

And then re-try accessing the ExperimentHub and you should get a snapshot of today, ie:

> library(ExperimentHub)
> eh = ExperimentHub()
snapshotDate(): 2023-07-21
GettyScience commented 11 months ago
Screenshot 2023-07-21 at 8 54 37 PM

Still unsuccessful

dariotommasini commented 11 months ago

I'm guessing that's still ExperimentHub version 2.8.1?

Let's try installing the development version of Bioconductor:

BiocManager::install(version = "3.18")

If that works, you can re-try installing ExperimentHub, and it should be the newest version:

BiocManager::install("ExperimentHub", force = TRUE)
GettyScience commented 11 months ago

Unsuccessful. Could the issue have something to do with using one of the new M2 Mac and not through an intel processor?

dariotommasini commented 11 months ago

Which part was unsuccessful? Can I see the error message?

GettyScience commented 11 months ago

There is no error message, it just returns the snapshot shown in the previous photo each time.

GettyScience commented 11 months ago

Running on PositCloud, it is now successful. It must be an issue with my version (somehow) or processor.

dariotommasini commented 11 months ago

Huh, very weird. Happy it's working now! Let me know if you get that error with the sampleTable again.

GettyScience commented 11 months ago

I have not tried by own data as of yet, just going through the vignette. I will try my own soon. Your help is much appreciated, thank you so much.

GettyScience commented 11 months ago

Running my data through results in this error

Error in constructNetworks(multiWGCNAdata, sampleTable, conditions1, conditions2, : inherits(datExpr, "SummarizedExperiment") | inherits(datExpr, .... is not TRUE

dariotommasini commented 11 months ago

What class is your datExpr object?

class(datExpr)

It should be either a SummarizedExperiment object or a data.frame object. Anything else will hit that stopifnot clause.

You might also want to check the version of multiWGCNA since this was a bug that I fixed earlier this week if you recall.

GettyScience commented 11 months ago

[1] "tbl_df" "tbl" "data.frame"

I forced a download from dev.tools. Is there a version I could specifically feed it?

GettyScience commented 11 months ago

This is the full error script

3. stop(simpleError(msg, call = if (p <- sys.parent(1L)) sys.call(p)))

  1. stopifnot(inherits(datExpr, "SummarizedExperiment") \ inherits(datExpr, "SummarizedExperiment"))
  2. constructNetworks(multiWGCNAdata, SampleTable, conditions1, conditions2, networkType = "signed", power = 18, minModuleSize = 40, maxBlockSize = 25000, reassignThreshold = 0, minKMEtoStay = 0.7, mergeCutHeight = 0.1, numericLabels = TRUE, pamRespectsDendro = FALSE, verbose = 3)
dariotommasini commented 11 months ago

Yep, looks like its the old buggy version for some reason. It should be multiWGCNA version 0.99.2. You already tried the force = TRUE from devtools? I think that should update it to the newest development version.

For now, the workaround can be to make your datExpr a SummarizedExperiment object. Something like this:

se = SummarizedExperiment(assays=list(counts=as.matrix(datExpr)))
GettyScience commented 11 months ago

That helped. I know I keep coming to you for help and it has been very appreciated.

New error:

Error in data.frame(X = rownames(assays(datExpr)[[1]]), assays(datExpr)[[1]]) : arguments imply differing number of rows: 0, 27655
dariotommasini commented 11 months ago

Check that your SummarizedExperiment object has colnames and rownames. Looks like its complaining that it cannot find rownames for it.

For example, I made a dummy example:

> temp
    sample1 sample2 sample3 sample4 sample5
op1       1       1       1       1       1
op2       1       1       1       1       1
op3       1       1       1       1       1
op4       1       1       1       1       1
> se = SummarizedExperiment(assays=list(counts = temp))
> se
class: SummarizedExperiment 
dim: 4 5 
metadata(0):
assays(1): counts
rownames(4): op1 op2 op3 op4
rowData names(0):
colnames(5): sample1 sample2 sample3 sample4 sample5
colData names(0):

Yours should also have rownames and colnames.

GettyScience commented 11 months ago

It was my error, the dataExpr did not have the row names column as actual row names.

Going further I hit the error:

Error: subscript contains out-of-bounds indices

dariotommasini commented 11 months ago

Just to be sure, did the vignettes work for you? Because if that's the case then it's just a matter of putting all the data in the same format as the vignettes.

GettyScience commented 11 months ago

Yes! And thanks to you, of which much is required of me, I did trouble shoot it and get the program to run. I did hit a problem, however:

softConnectivity: FYI: connecitivty of genes with less than 6 valid samples will be returned as NA. ..calculating connectivities....100% Error in datExpr[, !colnames(datExpr) %in% c("X", "kTotal", "kWithin", : incorrect number of dimensions

Our data only derived from 4 samples per genotype/tissue/treatment. I could potentially add other plant tissue data to increase the number of samples for this analysis of genotype/treatment, but I would prefer to just avoid that noise at the moment.

Is there a way to change the requirement of 6 valid samples to 3 or 4?

Thank you,

dariotommasini commented 11 months ago

The 6 sample remark is just a warning.

Please email me your datExpr and sampleTable in .csv format to dtommasini0@gmail.com and I'll take a look. This error has already been reported, but the thread has fallen silent and I don't know if it was resolved.

dariotommasini commented 11 months ago

I have reproduced the error as well. Taking a look right now.

dariotommasini commented 11 months ago

The issue was that your first column in the sampleTable is "sample" and not "Sample". It ran fine after:

colnames(sampleTable)[1] = "Sample"

I actually forgot that this was required, but I've updated the documentation to reflect this. Future versions might not be so picky with formatting, but for now try this easy fix.

Also, please do work through both vignettes as the astrocyte vignette as some analyses not covered by the autism workflow.

smukher2 commented 6 months ago

PLEASE HELP STOP PLAGIARISM AND VILLAINOUS SCIENTISTS BRENT FOGEL (UCLA) AND DARIO TOMMASINI (now PhD student in UC Berkeley) BY NOT CITING OR USING THIS, OR THEIR OTHER CODES AND PAPERS BY THEM. INSTEAD USE AND CITE THE ORIGINAL WORKS (WITH VIDEO TUTORIAL) BY DR. STEVE HORVATH, DR. PETER LANGFELDER AND DR. JEREMY MILLER.(details below)

*For details about wrong doings by Brent Fogel including and not limited to plagiarism by Brent Fogel and Dario Tommasini please see open letter at http://tinyurl.com/bde788x2 or file 'This To Apprise You About Wrong Doings By Brent Fogel including and not limited to plagiarism by Brent Fogel and Dario Tommasini.pdf' posted at https://gitlab.com/smukher2/openletter that I also emailed to UCLA, UC Berkeley, iScience and BMC Bioinformatics reporting plagiarism by Brent Fogel and Dario Tommasini in their two papers using this multiWGCNA code https://github.com/fogellab/multiWGCNA: Tommasini D, Fox R, Ngo KJ, Hinman JD, Fogel BL. Alterations in oligodendrocyte transcriptional networks reveal region-specific vulnerabilities to neurological disease. iScience. 2023 Mar 8;26(4):106358. doi: 10.1016/j.isci.2023.106358. PMID: 36994077; PMCID: PMC10040735. Tommasini D, Fogel BL. multiWGCNA: an R package for deep mining gene co-expression networks in multi-trait expression data. BMC Bioinformatics. 2023 Mar 24;24(1):115. doi: 10.1186/s12859-023-05233-z. PMID: 36964502; PMCID: PMC10039544.

*If you need WGCNA codes for different applications with video turorial consider using the original works (with video tutorials) by Dr. Steve Horvath, Dr. Peter Langfelder and Dr. Jeremy Miller: Langfelder, P., Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008). https://doi.org/10.1186/1471-2105-9-559 Miller JA, Horvath S, Geschwind DH. Divergence of human and mouse brain transcriptome highlights Alzheimer disease pathways. Proc Natl Acad Sci U S A. 2010 Jul 13;107(28):12698-703. Epub 2010 Jun 25. PMID: 20616000; PMCID: PMC2906579. https://doi.org/10.1073/pnas.0914257107

Video: Dr. Steve Horvath Weighted gene co-expression network analysis https://youtu.be/rRIRMW_RRS4?si=A-ZivIzwdRVLpaLa Video: Dr. Jeremy Miller How WGCNA Can be Used to Compare and Contrast Two Networks https://youtu.be/aBD67YmCBK4?si=eW9Ybv2nIWDUjkdT Full Playlist: WGCNA https://www.youtube.com/playlist?list=PLtlynCnS_vmB2kwhfkcfxIDbsSO9uniM5 Resources: Dr. Peter Langfelder lists further resources on his website https://peterlangfelder.com/2018/11/25/wgcna-resources-on-the-web/

Best regards, Shradha Mukherjee https://gitlab.com/smukher2 https://github.com/smukher2 https://orcid.org/0000-0002-3249-2551 https://pubmed.ncbi.nlm.nih.gov/?term=Shradha+Mukherjee