bc-anaisabel / juniperus_paper

Pipeline for analyzing Illumina MiSeq paired-end data of fungal communities
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

SNA Plot: Add legend to associate color to its corresponding fungal family #6

Closed bc-anaisabel closed 3 years ago

bc-anaisabel commented 4 years ago

I am using the package sna in R to plot a dataframe with presence/absence of fungal OTUs in each sample. I am grouping them by 4 different categories, so each OTU can be plotted in relation to its presence in each category. Also, since I want to know which fungal families are where, I want that each OTU is represented by the fungal family it has been assigned.

The problem is I am getting a plot that show this arrangement but I can't get the legend to show what each color represents. You can find the data here. I began with something like this:

par(mfrow=c(1,2), xpd=T)
gplot(as.one.mode(network_host),
        displaylabels = TRUE,
        label=rownames(network_host), gmode="graph",
        label.cex=0.6, vertex.col = color$Family, vertex.cex=2)
Screen Shot 2020-10-08 at 7 12 41 PM

But I would like to end with something similar to this, where you can distinguish the different families and distinguish between samples and OTUs.

Screen Shot 2020-10-08 at 7 05 33 PM

The data to work on it is created from:

#Select data
subset <- subset_taxa(subset.texcoco.binary, Trophic %in% c("a__am"))
subset

subset <-subset.texcoco.binary.beta

subset<-subset.myc

subset

#Subset
Top <- names(sort(taxa_sums(subset), TRUE)[1:500])
ent <- prune_taxa(Top, subset)
ent
taxa_sums(ent)

taxa_sums(subset) [1:10]
ent <- prune_taxa(taxa_sums(ent) > 0, ent)
any(taxa_sums(subset) == 0)
taxa_sums(ent) [1:10]
ent

#Merge by category 
nw <-merge_samples(subset, group = "Host")
network_host <- as.data.frame(otu_table(nw))
is.matrix(network_host)

#vectors
taxa_names(subset)
color_vector <- c("f__ Glomeraceae", "f__ Claroideoglomeraceae", "f__ Acaulosporaceae", "f__ Diversisporaceae", "f__ Gigasporaceae", "f__ Ambisporaceae", "f__ Paraglomeraceae", "f__ Unknown")
color<-as.matrix(tax_table(subset))
color<-as.data.frame(color)

#Gplot
gplot(network_host, thresh = 0.2, displaylabels = FALSE, vertex.col = color$Trophic)
network_host

#Merge more than one category 

sample_variables(subset)

variable1 = as.character(get_variable(subset, "Host"))
variable2 = as.character(get_variable(subset, "Site"))

sample_data(subset)$NewPastedVar <- mapply(paste0, variable1, variable2, 
                                            collapse = "_")
nw2<- merge_samples(subset, "NewPastedVar")

network_host <- as.data.frame(otu_table(nw2))
bc-anaisabel commented 3 years ago

I am changed a bit how the code looks:

#### network using sna #### 

#Select data
subset <- subset_taxa(subset.texcoco.binary.beta, Trophic %in% c("a__am"))
subset<- subset_samples(subset, Type %in% "root")
subset

any(taxa_sums(subset) == 0)
subset <- prune_taxa(taxa_sums(subset) > 0, subset)
subset

tax_table(subset)
otu_table(subset)

# vectors
taxa_names(subset)
# color_vector <- c()
color<-as.matrix(tax_table(subset))
color<-as.data.frame(color)
color<- color$Family

#Merge by category 

sample_variables(subset)

variable1 = as.character(get_variable(subset, "Host"))
variable2 = as.character(get_variable(subset, "Site"))

sample_data(subset)$NewPastedVar <- mapply(paste0, variable1, variable2, 
                                           collapse = "_")
nw2<- merge_samples(subset, "NewPastedVar")

network_host <- as.data.frame(otu_table(nw2))

#plot 

palette(polychrome(n=27))
gplot(network_host, gmode="graph", jitter=FALSE,
      displaylabels=FALSE, 
      boxed.labels=FALSE, label.pos=1, label.cex=1, vertex.cex=2,
      vertex.col= as.numeric(color$Family))

So now I get something like this:

Rplot02

valeriafloral commented 3 years ago

I don't know if you know ggnet2 that is a function from library GGally library that plot networks as ggplot objects. I think that kind of plot are easier to manage.

https://briatte.github.io/ggnet/#node-sizes

bc-anaisabel commented 3 years ago

Here is the documentation of SNA package sna.pdf

IsauraRReinhold commented 3 years ago

maybe you can use ggnet instead of sna

https://briatte.github.io/ggnet/

devtools::install_github("briatte/ggnet")
library(ggnet)
redgcko7 commented 3 years ago

After viewing the colors with labels, we noticed that two OTUs with the same color actually correspond to different families, so something is wrong with the coloring according to families...

IsauraRReinhold commented 3 years ago

maybe first you can check this:

https://www.rdocumentation.org/packages/network/versions/1.13.0/topics/edgeset.constructors

bc-anaisabel commented 3 years ago

We tried to transform to matrix but we got an error message when trying to use the network command

network_host <- as.matrix(otu_table(network_host, taxa_are_rows = TRUE)) net = network(network_host, directed = FALSE)

Error in if (matrix.type == "edgelist") { : missing value where TRUE/FALSE needed

camillethuyentruong commented 3 years ago

Maybe numeric columns need to be changed to character: https://stackoverflow.com/questions/62316707/plotting-temporal-network-true-false-value-needed-for-edge-list

arihelrnan commented 3 years ago

In the Network funtion documentation (https://www.rdocumentation.org/packages/network/versions/1.13.0/topics/network), you can find the argument matrix.type, and you can use option "incedence" or other option with egelist contruction with you type data. Also, you can try with other options in the funtion with your data.

bc-anaisabel commented 3 years ago

We tried to convert to say first that columns are not factors using:

network_host<-as.data.frame(network_host, stringsAsFactors = F)

But we still got the same error message. So I'm going to try to have a dataframe columns from numeric to character using something as:

network_host[, 2] <- as.character(network_host[, 2])

bc-anaisabel commented 3 years ago

Now trying with ggnet but can't make the association from color vector to the rest of the data.

network_host<-t(network_host)
network_host<-as.data.frame(network_host, stringsAsFactors = F)
network_host[, 2] <- as.character(network_host[, 2])
net = network(network_host, directed = FALSE)

> ggnet2(net, node.size = 3, color = color,  edge.size = 1, edge.color = "grey", label = TRUE)
Error in set_node(node.color, "node.color") : incorrect node.color length
bc-anaisabel commented 3 years ago

If I don't add any color it looks like this:

ggnet2(net, node.size = 3, color = "black", edge.size = 1, edge.color = "grey", label = TRUE)

ggnet_am_root_3

valeriafloral commented 3 years ago

Here is a tutorial for co-occurrence microorganism networks, they used phyloseq, SPIEC-EASI and ggnet. Maybe you can follow it with your dataset.

They obtained a network colored by 'Phylum' with the following code:

# Graficar la red 'se_net'
plot_network(se_net, comau69_3, type = "taxa", color = "Phylum", shape = "Kingdom", label = NULL)

image

se_net: is an igraph object made with phyloseqoutput. comau69_3 is a phyloseqoutput

bc-anaisabel commented 3 years ago

So I managed to get somewhere with ggnet, because now the nodes that are the hosts are distinguishables from the nodes that are the otus. This had to do with indicating this is a bipartite network using "mode":

net = network(otu_table(subset), directed = FALSE)
ggnet2(net, node.size = 3, edge.size = 1, node.color = 'mode', edge.color = "grey", label = TRUE)

The result was this ggnet_am_root_4

bc-anaisabel commented 3 years ago

So I figured out what was happening. I had to take the row names from the dataset which contained the taxonomy attributes and otus to link with the other dataset that contained the otu presence/absence matrix and samples. Otherwise it was not recognizing how to pair the otus with the families. I have to update a few more things and after that I'll close the issue. But it finally assigns correctly the otus with their taxonomy:

networkplot_am

bc-anaisabel commented 3 years ago

To use with gplot from sna package:

# Merge more than one category using sample variables from the phyloseq object 

sample_variables(subset)

variable1 = as.character(get_variable(subset, "Host"))
variable2 = as.character(get_variable(subset, "Site"))

# Paste the new variable
sample_data(subset)$NewPastedVar <- mapply(paste0, variable1, variable2, 
                                           collapse = "_")

# Create new object with pasted variables
nw2<- merge_samples(subset, "NewPastedVar")

# Create dataframe with presence/absence data for each OTU in each category  
network_host <- as.data.frame(otu_table(nw2))

# You'll need to turn it so hosts (nodes) are columns 
network_host<-t(network_host)

# Take out to excel (or edit in R following the format pointed below) the taxonomy information that we are going to use later on for plotting 
nuevoedge<-write.csv(color, file = "nuevoedge_df.csv")

# Read file
nuevoedge<-read.csv("nuevoedge_df.csv")

# Check what is there 
nuevoedge # we need the id as a column, i.e. OTUname as the first column followed by the taxonomy 

So it would look like:

> head(nameofyourtaxonomyinfodataframe)
     id    Myc Trophic   Domain            Phylum               Class             Order
1  OTU3 t__myc  a__ecm k__Fungi p__ Basidiomycota  c__ Agaricomycetes   o__ Sebacinales
2  OTU4 t__myc  a__ecm k__Fungi p__ Basidiomycota  c__ Agaricomycetes o__ Thelephorales
3  OTU6 t__myc  a__ecm k__Fungi p__ Basidiomycota  c__ Agaricomycetes    o__ Russulales
4  OTU7 t__myc  a__ecm k__Fungi    p__ Ascomycota c__ Dothideomycetes o__ Mytilinidales
5 OTU13 t__myc  a__ecm k__Fungi p__ Basidiomycota  c__ Agaricomycetes   o__ Sebacinales
6 OTU24 t__myc  a__ecm k__Fungi p__ Basidiomycota  c__ Agaricomycetes o__ Thelephorales

and

the otu table/ host presence would look like:

> head(network_host)
      Juniperusmixed Juniperusperturbated Quercusmixed Quercusnative
OTU3               0                    0            1             0
OTU4               0                    0            0             1
OTU6               0                    0            0             3
OTU7               3                    1            4             4
OTU13              1                    1            0             0
OTU24              0                    0            1             0

Now using Gplot , color by Family

gplot(network_host, thresh = 0.2, displaylabels = TRUE, usearrows=FALSE, legend(x=1,y=-1, pch=21, col = "#777777", pt.cex=2, cex=.8, bty="n", ncol=1), vertex.col = nuevoedge$Family)

gplot(network_host, thresh = 0.2, displaylabels = TRUE, usearrows=FALSE, legend(x=1,y=-1, color, pch=21,
pt.cex=2, cex=.8, bty="n", ncol=1), vertex.col = nuevoedge$Family)

Make another plot with nicer colors and nodes

call color palette

pal2 <-polychrome(27)

so it does not plot on the same page

par(mfrow=c(1,2), xpd=T)

gplot(as.one.mode(network_host), displaylabels = TRUE, gmode="graph", label.cex=1, vertex.col = nuevoedge$Family, vertex.cex=1)

par(mfrow=c(1,2), xpd=T)

palette(polychrome(n=27)) gplot(network_host, gmode="graph", jitter=FALSE, displaylabels = TRUE, boxed.labels=FALSE, label.pos=1, label.cex=1, vertex.cex=2, vertex.col= nuevoedge$Family)

par(mfrow=c(1,2), xpd=T)

Plot with no labels

gplot(network_host, gmode="graph", jitter=FALSE, displaylabels = FALSE, boxed.labels=FALSE, label.pos=1, label.cex=1, vertex.cex=2, vertex.col= nuevoedge$Family)

You get something like this:

network_ecm_root_nolabels