cstoeckert / iterativeWGCNA

Extension of the WGCNA program to improve the eigengene similarity of modules and increase the overall number of genes in modules.
GNU General Public License v2.0
61 stars 17 forks source link

which passes/module to keep when I get >100 #31

Open fossilfriend opened 5 years ago

fossilfriend commented 5 years ago

And, after runing iterativeWGCNA with the following code: iterativeWGCNA -i data.txt --wgcnaParameters maxBlockSize=50000,nthreads=20 --enableWGCNAThreads

I got 15 pass (pass1 to pass 15) and more than 1,00 modules in final-membership.txt in total. I wonder if all the modules in the 15 pass must be used or only the last pass (pass15) be used to assign the membership ?

Originally posted by @wangjiawen2013 in https://github.com/cstoeckert/iterativeWGCNA/issues/30#issuecomment-506670597

fossilfriend commented 5 years ago

Each pass of iterativeWGCNA clusters a different set of genes. After pass1 is completed, a set of genes has been classified and a set of genes is left over as a residual to classification. Pass2 begins with those residuals. And so on.

Early passes catch the strongest signal in the dataset -- in terms of network structure, they would capture the main subnetworks. Later passes are more likely to isolate small groups of outliers that are similar to each other but have a more fuzzy membership (overlap) the main subnetworks.

So if you want to focus only on a subset of the modules, you would be better off focusing on the first few passes. The result from pass1 is very much what you should expect from running WGCNA, just a bit cleaner version. iterativeWGCNA does output the complete result at the end of each pass and iteration in a pass allowing users to make decisions on a case by case basis as to at what point they feel that no more information is gained from additional module detection.

However, it is important to note that the smallest modules detected in the latest passes are not "garbage bin" modules. The detected modules group similarly co-expressed genes (where the extent of that similarity depends on the KME stringency parameters passed to iterativeWGCNA). A true "garbage bin" module would group genes with dissimilar expression patterns simply because they are equally dissimilar to everything else. Whether or not these modules are biologically meaningful is another question and has to be evaluated on a case by case basis.

Along similar lines, if you increase the minKMEtoStay and minCoreKME parameters (say to 0.9) more genes will be filtered out and you will get fewer small modules.