arihelrnan / Endophyte_microbiome_Vanilla_planifolia

MIT License
1 stars 0 forks source link

Problems with colors in Heat_tree_matrix #4

Closed arihelrnan closed 3 years ago

arihelrnan commented 3 years ago

I want to use metacoder library (https://grunwaldlab.github.io/metacoder_documentation/index.html) to show pairwise comparisons of communities and analyze taxonomic variables among my sample groups.

The input file is OTUs_Table-norm-tax.tab and mapping-file.txt

My R script is this:

> library(metacoder)
Loading required package: taxa
This is metacoder verison 0.3.3 (stable)
Warning message:
package ‘metacoder’ was built under R version 3.6.2 
> otu_table <-  read.table ("OTUs_Table-norm-tax.tab",
                           check.names = FALSE,
                           header = TRUE,
                           dec = ".",
                           sep = "\t",
                           row.names = 1,
                           comment.char = "") #Load tab delimited file with OTU table information
> sample_table <-  read.table ("mapping_file.txt",
                              check.names = FALSE,
                              header = TRUE,
                              dec = ".",
                              sep = "\t",
                              comment.char = "") #Load tab delimited file with information of each sample
> metamapa <- parse_tax_data(otu_table, #Make taxmap object.
                           class_cols = "taxonomy", # the column that contains taxonomic information
                            class_sep = ";", # The character used to separate taxa in the classification
                            class_regex = "^([a-z]{0,1})_{0,2}(.*)$", # Regex identifying where the data for each taxon is
                            class_key = c("tax_rank" = "taxon_rank", "name" = "taxon_name"))  
> metamapa #Here is what that object looks like
<Taxmap>
    1694 taxa: aab. Bacteria ... cne. adolescentis
    1694 edges: NA->aab, aab->aac ... bqr->cnd, axa->cne
    2 data sets:
    tax_data:
      # A tibble: 5,622 x 15
        taxon_id V3.16s V5.16s V6.16s V7.16s V8.16s V9.16s
        <chr>     <dbl>  <int>  <dbl>  <dbl>  <dbl>  <dbl>
      1 bqs           0      0      0      0      0      0
      2 alc           0      0      0      0      0      0
      3 bqt           0      0      0      0      0      0
      # ... with 5,619 more rows, and 8 more variables:
      #   V10.16s <dbl>, V11.16s <dbl>, V12.16s <dbl>,
      #   V13.16s <dbl>, V14.16s <dbl>, V15.16s <dbl>,
      #   V16.16s <dbl>, taxonomy <fct>
    class_data:
      # A tibble: 34,541 x 5
        taxon_id input_index tax_rank name       regex_match   
        <chr>          <int> <chr>    <chr>      <chr>         
      1 aab                1 k        Bacteria   k__Bacteria   
      2 aac                1 p        Proteobac~ p__Proteobact~
      3 abe                1 c        Deltaprot~ c__Deltaprote~
      # ... with 3.454e+04 more rows
  0 functions:
> metamapa=filter_taxa(metamapa, n_obs> 50)#Fillter taxa with less 50 reads
> metamapa$data$tax_abund <- calc_taxon_abund(metamapa, "tax_data",
                                                      cols = sample_table$`#SampleID`) #Getting information per taxon and add to metamapa
> metamapa
<Taxmap>
  102 taxa: aab. Bacteria, aac. Proteobacteria ... cbf. 
  102 edges: NA->aab, aab->aac ... bfq->can, bgh->cbf
  3 data sets:
    tax_data:
      # A tibble: 5,622 x 15
        taxon_id V3.16s V5.16s V6.16s V7.16s V8.16s V9.16s
        <chr>     <dbl>  <int>  <dbl>  <dbl>  <dbl>  <dbl>
      1 abe           0      0      0      0      0      0
      2 aes           0      0      0      0      0      0
      3 aae           0      0      0      0      0      0
      # ... with 5,619 more rows, and 8 more variables:
      #   V10.16s <dbl>, V11.16s <dbl>, V12.16s <dbl>,
      #   V13.16s <dbl>, V14.16s <dbl>, V15.16s <dbl>,
      #   V16.16s <dbl>, taxonomy <fct>
    class_data:
      # A tibble: 34,541 x 5
        taxon_id input_index tax_rank name       regex_match   
        <chr>          <int> <chr>    <chr>      <chr>         
      1 aab                1 k        Bacteria   k__Bacteria   
      2 aac                1 p        Proteobac~ p__Proteobact~
      3 abe                1 c        Deltaprot~ c__Deltaprote~
      # ... with 3.454e+04 more rows
    tax_abund:
      # A tibble: 102 x 14
        taxon_id V3.16s V5.16s V6.16s V7.16s V8.16s V9.16s
        <chr>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
      1 aab       4265.   4265  4265.  4265.  4265. 4265  
      2 aac       3549.   1215  3840.  3118.  2227. 3343. 
      3 aad        134.   2320   120.   863.   729.   83.6
      # ... with 99 more rows, and 7 more variables:
      #   V10.16s <dbl>, V11.16s <dbl>, V12.16s <dbl>,
      #   V13.16s <dbl>, V14.16s <dbl>, V15.16s <dbl>,
      #   V16.16s <dbl>
  0 functions:
> metamapa$data$tax_occ <- calc_n_samples(metamapa, "tax_abund", groups = sample_table$Estado) #calculate the number of samples that have reads for each taxon
> metamapa
<Taxmap>
  102 taxa: aab. Bacteria, aac. Proteobacteria ... cbf. 
  102 edges: NA->aab, aab->aac ... bfq->can, bgh->cbf
  4 data sets:
    tax_data:
      # A tibble: 5,622 x 15
        taxon_id V3.16s V5.16s V6.16s V7.16s V8.16s V9.16s
        <chr>     <dbl>  <int>  <dbl>  <dbl>  <dbl>  <dbl>
      1 abe           0      0      0      0      0      0
      2 aes           0      0      0      0      0      0
      3 aae           0      0      0      0      0      0
      # ... with 5,619 more rows, and 8 more variables:
      #   V10.16s <dbl>, V11.16s <dbl>, V12.16s <dbl>,
      #   V13.16s <dbl>, V14.16s <dbl>, V15.16s <dbl>,
      #   V16.16s <dbl>, taxonomy <fct>
    class_data:
      # A tibble: 34,541 x 5
        taxon_id input_index tax_rank name       regex_match   
        <chr>          <int> <chr>    <chr>      <chr>         
      1 aab                1 k        Bacteria   k__Bacteria   
      2 aac                1 p        Proteobac~ p__Proteobact~
      3 abe                1 c        Deltaprot~ c__Deltaprote~
      # ... with 3.454e+04 more rows
    tax_abund:
      # A tibble: 102 x 14
        taxon_id V3.16s V5.16s V6.16s V7.16s V8.16s V9.16s
        <chr>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
      1 aab       4265.   4265  4265.  4265.  4265. 4265  
      2 aac       3549.   1215  3840.  3118.  2227. 3343. 
      3 aad        134.   2320   120.   863.   729.   83.6
      # ... with 99 more rows, and 7 more variables:
      #   V10.16s <dbl>, V11.16s <dbl>, V12.16s <dbl>,
      #   V13.16s <dbl>, V14.16s <dbl>, V15.16s <dbl>,
      #   V16.16s <dbl>
    tax_occ:
      # A tibble: 102 x 4
        taxon_id Asintomatica Sintomatica Silvestre
        <chr>           <int>       <int>     <int>
      1 aab                 4           5         4
      2 aac                 4           5         4
      3 aad                 4           5         4
      # ... with 99 more rows
  0 functions:
> metamapa$data$diff_table <- compare_groups(metamapa, dataset = "tax_abund",
                                            cols = sample_table$`#SampleID`,
                                            groups = sample_table$Estado) #Compare data abundance among groups and add to matrix
There were 50 or more warnings (use warnings() to see the first 50)
> metamapa <- mutate_obs(metamapa, "diff_table",
                         wilcox_p_value = p.adjust(wilcox_p_value, method = "fdr")) #Correct for multiple comparisons and set non-significant differences to zero
> metamapa$data$diff_table$log2_median_ratio[metamapa$data$diff_table$wilcox_p_value > 0.05] <- 0
> heat_tree_matrix(metamapa,
                  dataset = "diff_table",
                  node_size = n_obs,
                  node_label = taxon_names, node_label_size_range = c(0.025, 0.04),
                  node_color = log2_median_ratio,
                  node_color_range = diverging_palette(),
                  node_color_trans = "linear",
                  node_color_interval = c(-3, 3),
                  edge_color_interval = c(-3, 3),
                  node_size_axis_label = "Number of OTUs",
                  node_color_axis_label = "Log2 ratio median proportions", layout = "da", initial_layout = "re",
                  key_size = 0.67,
                  seed = 2) #Make a heat tree matrix graphic with the 

My final graphic with comparisons among groups is this:

image

This is an example of how must be my heat tree matrix:

image

camillethuyentruong commented 3 years ago

Looking at your OTU table, it is not clear to me what kind of data you are using for abundance of OTUs. Are you sure they are count?

bc-anaisabel commented 3 years ago

You can transform your data to relative abundance so it looks like read counts:

# Transform read counts into  % relative abundances: multiply by 1000 and transform to next integer so it looks like read count
datathatyouwant = transform_sample_counts(yourtable, function(x) 100000 * x/sum(x))
otu_table(datathatyouwant) = ceiling(otu_table(datathatyouwant, "matrix")) # transform to next integer so it looks like read count
otu_table(datathatyouwant) # check otu table
camillethuyentruong commented 3 years ago

Did you check the format of your tables? OTU abundance is a matrix and metadata is a table. The format need to correspond to the example given by metacoder.

NellyJazminPC commented 3 years ago

Taking this workshop as a reference https://grunwaldlab.github.io/metacoder_documentation/example.html we note that the format of the otus table is different

bc-anaisabel commented 3 years ago

Do you need to have the same format as in the example? in case you do, notice that to match the format you need to have the taxonomy (lineage) at the beginning before the counts. Right now you have the taxonomy after the counts

NellyJazminPC commented 3 years ago

50 warnings were observed in the metamapa$data$diff_table

cannot compute exact p-value with ties

bc-anaisabel commented 3 years ago

This is related to the adjustment of the multiple comparisons for p values: example

"As stated in the help page ?wilcox.test, by default wilcox.test() calculates an exact p-value if the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used. In our example, the sample contains less than 50 finite values but there are ties. That’s why we see the warning message “cannot compute exact p-value with ties” above. R uses normal approximation to calculate the p-value. However, the returned p-value is different from our step-by-step calculation above, which uses the normal approximation. The reason is that the p-value is adjusted by the continuity correction which is supposed to be more accurate. We will not go into detail the concept of continuity correction here. To get the p-value we calculated above, we can use the option correct=FALSE to turn off the continuity correction:"

wilcox.test(final ~ group, correct=FALSE)

Warning in wilcox.test.default(x = c(11, 70, 75, 85, 88, 92, 96), y =
c(60, : cannot compute exact p-value with ties

    Wilcoxon rank sum test

data:  final by group
W = 28.5, p-value = 0.9538
alternative hypothesis: true location shift is not equal to 0
bc-anaisabel commented 3 years ago

You can transform your data to relative abundance so it looks like read counts:

# Transform read counts into  % relative abundances: multiply by 1000 and transform to next integer so it looks like read count
datathatyouwant = transform_sample_counts(yourtable, function(x) 100000 * x/sum(x))
otu_table(datathatyouwant) = ceiling(otu_table(datathatyouwant, "matrix")) # transform to next integer so it looks like read count
otu_table(datathatyouwant) # check otu table

From this step on we realized we were proposing changes and modifying the script without trying if each of the changes was working. So at this point we already changed many different things without testing them. Important that you make the change and right after you try it before going to change a new thing.

NellyJazminPC commented 3 years ago

Check if the format to import the tables is correct

https://grunwaldlab.github.io/analysis_of_microbiome_community_data_in_r/03--parsing.html

arihelrnan commented 3 years ago

I transform my data with according function calc_obs_props which divides each sample’s counts by the total number of counts observed for each sample, resulting in a proportion. Also, it doesn't necessarily make a correction for multiple comparisons because I have only three groups. I obtain this plot:
image