fogellab / multiWGCNA

an R package for deep mining gene co-expression networks in multi-trait expression data
13 stars 2 forks source link

colors when having more than 2 groups. #10

Closed janvalddius closed 1 month ago

janvalddius commented 6 months ago

counts counts_matrix_raw.txt the network network.zip script Multi_WGCNA.txt

Sample table but also can be generated form the script DataFrame with 65 rows and 3 columns Sample age infection

F14R_C3_T0_S1 F14R_C3_T0_S1 C3 T0 F14R_C3_T0_S2 F14R_C3_T0_S2 C3 T0 F14R_C3_T0_S3 F14R_C3_T0_S3 C3 T0 F14R_C3_T0_S4 F14R_C3_T0_S4 C3 T0 F14R_C3_T0_S5 F14R_C3_T0_S5 C3 T0 ... ... ... ... F14R_C8_T12_S2 F14R_C8_T12_S2 C8 T12 F14R_C8_T12_S3 F14R_C8_T12_S3 C8 T12 F14R_C8_T12_S4 F14R_C8_T12_S4 C8 T12 F14R_C8_T12_S5 F14R_C8_T12_S5 C8 T12 F14R_C8_T12_S6 F14R_C8_T12_S6 C8 T12 For example, is not possible to plot: results$diffModExp = runDME(multi_bwnet_F14R[["combined"]], sampleTable, p.adjust = "fdr", refCondition = "age", testCondition = "Infection", plot=T) # out="ANOVA_DME.pdf")
dariotommasini commented 6 months ago

Hi @janvalddius ,

Could you post the error message that you get when executing runDME?

I think the issue is likely the fact that you have four groups in your test trait. Could you try running the analysis using only T0 and T12 (e.g. remove T3 and T6 samples)? I just want to see if that's the issue.

Thanks!

janvalddius commented 6 months ago

Hello Dario,

You are right, the issue (as the other people had) is when one has more than 2 groups in the condition. I saw it in Github after I created a new issue. In my case, I could run the network with all the conditions (factors): age (3 levels) and infection (4 levels). However, there are some analyses that I do not know how to interpret. I will take a glimpse into the internet to understand the concepts.

What is true is the modules from WGCNA are the same as the modules found in the "combined" form multi WGCNA. So this is important to know.

I will recreate the network without the T3 and T6 and I will let you know it.

Thanks for all.

Alejandro

Le 2023-12-21 00:25, Dario Tommasini a écrit :

Hi @janvalddius [1] ,

Could you post the error message that you get when executing runDME?

I think the issue is likely the fact that you have four groups in your test trait. Could you try running the analysis using only T0 and T12 (e.g. remove T3 and T6 samples)? I just want to see if that's the issue.

Thanks!

-- Reply to this email directly, view it on GitHub [2], or unsubscribe [3]. You are receiving this because you were mentioned.Message ID: @.***>

Links:

[1] https://github.com/janvalddius [2] https://github.com/fogellab/multiWGCNA/issues/10#issuecomment-1865270306 [3] https://github.com/notifications/unsubscribe-auth/AY3ON43Y3C32KXFWTMBFYQ3YKNXVXAVCNFSM6AAAAABAZ2GWKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRVGI3TAMZQGY

dariotommasini commented 6 months ago

That's correct, the combined network is simply the network with all samples and all conditions.

We had designed the package for the following common design: condition 1 is treatment vs control, and condition 2 is either tissue or time point. You can see in our paper that's how we used it on two published datasets. So the reference condition (eg tissue, time point, age in your case) can have many groups (eg astrocyte data had 4 regions).

We hadn't considered that the test condition might have more than two groups. There's no reason it shouldn't work for more than two, but I haven't tested it on more than two. However I'm happy to work with you and modify the code to allow more than two conditions for the test condition.

janvalddius commented 6 months ago

Hello Dario,

That is what I realized after the examples provided. In general, researchers do A (control) vs B (treatment) and then one can test different levels as drug/toxic/regions /time in each group. It makes a lot of sense.

I have read carefully both tutorials and the paper itself, and the more I read it the more I understand. However, when results are not so "binary" like mine, then interpreting biological responses is a bit more subtle. In my case, although the first condition is age (time), I considered this one as a factor and the exposure to the virus in time (hours). The truth is both conditions are two-time variables and then the modules somehow are expected to be present.

I am more than happy to cooperate with you and to improve the package, even providing visual plots and other analysis with ggplot2, but being honest with you, I am a biologist and I have learned bioinformatics and coding on my own due to the needs of the thesis and my post-doc.

Also, I always wanted to create/be involved in projects of packages in R. We can discuss it if you feel comfortable and the chances are suitable for everyone.

During these days I will provide the results based on 2 factors age (4 vs 28 months old) with (two time infection T0 and T12 hours). The most extreme situations for each condition.

Alejandro

Le 2023-12-21 17:54, Dario Tommasini a écrit :

That's correct, the combined network is simply the network with all samples and all conditions.

We had designed the package for the following common design: condition 1 is treatment vs control, and condition 2 is either tissue or time point. You can see in our paper that's how we used it on two published datasets. So the reference condition (eg tissue, time point, age in your case) can have many groups (eg astrocyte data had 4 regions).

We hadn't considered that the test condition might have more than two groups. There's no reason it shouldn't work for more than two, but I haven't tested it on more than two. However I'm happy to work with you and modify the code to allow more than two conditions for the test condition.

-- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you were mentioned.Message ID: @.***>

Links:

[1] https://github.com/fogellab/multiWGCNA/issues/10#issuecomment-1866640922 [2] https://github.com/notifications/unsubscribe-auth/AY3ON47MTRU5VAL3QKOHADLYKRSTXAVCNFSM6AAAAABAZ2GWKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRWGY2DAOJSGI

klai001 commented 4 months ago

Hi I waswondering if there was any updates on this for setting the 'Disease" level for > 2 groups? in my case, there is 3 factors (genotype,time, health) 2 levels per factor and what i did was simply to paste0 one of the main factors (genotype time) to derive one factor with 2 levels (for health) and another factor with 4 levels ( genotypetime). i dont think setting 'health' as the "disease level" makes sense for my question cos i wanted to know the effects of genotypetime subcategorised to their health groups and not the other way round.

dariotommasini commented 4 months ago

Hi @klai001 ,

Sorry I haven't made any progress on this. I think trying to collapse three dimensions to two is the best you can do here. Do you get an error when doing that? It should work.

You can also just analyze your data in subsets, ie genotype vs time for a specific health group.

klai001 commented 4 months ago

I have resolved the overlapping problem of legend. For anyone encountering the same problem; i edited the source code with a minor edit and these are the depending functions and packages `#edit their source code of drawMultiWGCNAnetwork() library(igraph) library(stringr) library(scales) drawMultiWGCNAnetwork <- function(WGCNAlist, comparisonList, moduleOfInterest, design, overlapCutoff = 0, padjCutoff = 1, removeOutliers = TRUE, alpha = 1e-50, layout = NULL, hjust = 0.4, vjust = 0.3, width = 0.5, colors = NULL){

extract the overlaps objects into a list

overlapList=lapply(comparisonList, function(x) x$overlap)
overlapList=do.call(rbind, overlapList)
filteredOverlapList=overlapList

# remove outliers if necessary
if(removeOutliers) {
    for(WGCNA in WGCNAlist){
        filteredOverlapList=filteredOverlapList[!filteredOverlapList$mod1 %in% WGCNA@outlierModules,]
        filteredOverlapList=filteredOverlapList[!filteredOverlapList$mod2 %in% WGCNA@outlierModules,]
    }
}

admittedModules=unique(c(filteredOverlapList$mod1, filteredOverlapList$mod2))

# generate the multiWGCNA layout
if(is.null(layout)){
    myCoords=list()
    for(level in 1:3){
        WGCNAs=getLevel(level, design)
        from=0-width*length(WGCNAs)/2
        to=0+width*length(WGCNAs)/2
        x.coordinates=seq(from, to, length.out=length(WGCNAs))
        if(level==1) x.coordinates=0
        for(nWGCNA in 1:length(WGCNAs)){
            nModules=length(admittedModules[startsWith(admittedModules, WGCNAs[[nWGCNA]]) ])
            myCoords=append(myCoords, list(cbind(runif(nModules, x.coordinates[[nWGCNA]], x.coordinates[[nWGCNA]]+hjust), 3-level+runif(nModules, -vjust, vjust))))
        }
    }
    layout=do.call(rbind, myCoords)
}

#make the igraph object
graph=graph_from_data_frame(d=filteredOverlapList, directed = FALSE)

#node and edge attributes
vcol=str_split_fixed(V(graph)$name, "_", 2)[,1]
conditions=unique(vcol)

# Colors of modules by condition
if(is.null(colors)) palette = colors(length(conditions), random = TRUE)
if(!is.null(colors)) palette = colors

for(condition in 1:length(conditions)){
    vcol[vcol==conditions[[condition]] ]=palette[[condition]]
}
V(graph)$color=vcol

# edge attributes
E(graph)$weight=-log10(E(graph)$p.adj)
E(graph)$width=rescale(E(graph)$weight, from=c(0, 320), to=c(0,5))
ealpha=rescale(-log10(E(graph)$p.adj), from=c(0, 320), to=c(0,1))
ecol=lapply(ealpha, function(x) rgb(1, 0, 0, x))
E(graph)$color=unlist(ecol)

modulesOfInterest=unique(c(filteredOverlapList$mod2[filteredOverlapList$mod1==moduleOfInterest & filteredOverlapList$p.adj<alpha], 
    filteredOverlapList$mod1[filteredOverlapList$mod2==moduleOfInterest & filteredOverlapList$p.adj<alpha]))

# delete edges
graph <- delete.edges(graph, which(!(filteredOverlapList$mod1 %in% modulesOfInterest | filteredOverlapList$mod2 %in% modulesOfInterest)))

# plot graph
plot = plot(graph, vertex.label.color="black", vertex.size=3, vertex.label=NA, 
            vertex.label.cex=.5, layout=layout) 

# second call to plot over with edges of interest
legend("topright", legend = conditions, pch=21, col=palette, pt.bg=palette, 
       pt.cex=1, cex=.8, bty="n", ncol=1)

return(plot) }

draw a basic network

drawNetwork <- function(matrix, threshold=0, nodeList=NULL, edgeList=NULL, layout=layout_with_fr, removeFreeNodes=TRUE){ graph=graph_from_adjacency_matrix(matrix, mode="undirected", weighted=TRUE) graph <- delete.edges(graph, which(E(graph)$weight < threshold)) if(removeFreeNodes) graph <- delete.vertices(simplify(graph), degree(graph)==0) ealpha=rescale(E(graph)$weight, from=c(0,1), to=c(0,0.4)) ecol=lapply(ealpha, function(x) rgb(0.2, 0.2, 0.2, x))

if(length(nodeList)>0){
    graph <- igraph::delete.vertices(graph, which(!V(graph)$name %in% nodeList))
    layout=layout[match(V(graph)$name, nodeList),]
}

if(length(edgeList)>0){
    graph <- igraph::delete.edges(graph, which(!apply(igraph::as_edgelist(graph), 1, function(x) 
        any(unlist(lapply(1:nrow(edgeList), function(y) {
            x[[1]]==edgeList[y,1] & x[[2]]==edgeList[y,2]
        }))))))
}

plot(graph, layout=layout, vertex.size=4, edge.width=1.5, 
    vertex.color="white", vertex.label=NA, edge.color=unlist(ecol))
list(V(graph)$name, igraph::layout.fruchterman.reingold(graph), igraph::as_edgelist(graph))

}

getLevel <- function(level, design){ if(level==1){ return("combined") } return(unique(design[,level])) }`

dariotommasini commented 1 month ago

Nice, happy you resolved it!