BIMSBbioinfo / pigx_rnaseq

Bulk RNA-seq Data Processing, Quality Control, and Downstream Analysis Pipeline
GNU General Public License v3.0
20 stars 11 forks source link

gprofiler2 report not shown with DT::datatable #119

Closed smoe closed 2 years ago

smoe commented 2 years ago

When executed in an R shell on the same machine that generated the HTML via snakemake, the gene set enrichment analysis is just fine. The variable "max" indicates >800 terms to be found, but there is no output in the web page. Could that number be too large? The first tables of the report, e.g. for the the variables in the params list, appear just fine, though. The HTML source code of the web page shown in the browser is completely blank, except for the code showing the R code. Would you have any idea for me how to approach that problem and come up with a better problem/bug report?

> xfun::session_info('DT')
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux bookworm/sid

Locale:
  LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
  LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
  LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
  LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
  LC_ADDRESS=C               LC_TELEPHONE=C            
  LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

Package version:
  base64enc_0.1.3   crosstalk_1.1.1   digest_0.6.28     DT_0.17          
  fastmap_1.1.0     graphics_4.1.1    grDevices_4.1.1   htmltools_0.5.2  
  htmlwidgets_1.5.4 jsonlite_1.7.2    later_1.3.0       lazyeval_0.2.2   
  magrittr_2.0.1    methods_4.1.1     promises_1.2.0.1  R6_2.5.1         
  Rcpp_1.0.7        rlang_0.4.12      stats_4.1.1       utils_4.1.1      
  yaml_2.2.1       

When manually executing the lines, I see:

>   goResults <- go[['result']]
  #order by p-value
  goResults <- goResults[order(goResults$p_value),]

  goResults <- subset(goResults, select = c('p_value', 'term_size', 'query_size',
                                            'intersection_size', 'precision', 'recall',
                                            'term_id', 'source', 'term_name',
                                            'effective_domain_size', 'source_order'))

  #save full GO term table to disk
  goResultsFile <- file.path(workdir, paste0(prefix, '.GOterms.tsv'))
  write.table(x = goResults,
              file = goResultsFile,
              quote = FALSE, sep = '\t')

  #only display top GO terms in the HTML report.
  max <- ifelse(nrow(goResults) > 1000, 1000, nrow(goResults))
> max
[1] 817
> str(goResults)
'data.frame':   817 obs. of  11 variables:
 $ p_value              : num  2.98e-140 3.01e-127 1.16e-106 5.94e-100 4.89e-97 ...
 $ term_size            : int  14808 11951 13117 12046 14027 13201 14767 5579 5579 5579 ...
 $ query_size           : int  8486 8486 8486 8486 8486 8486 8519 8486 8486 8486 ...
 $ intersection_size    : int  7334 6143 6565 6094 6906 6545 7236 3008 3008 3008 ...
 $ precision            : num  0.864 0.724 0.774 0.718 0.814 ...
 $ recall               : num  0.495 0.514 0.5 0.506 0.492 ...
 $ term_id              : chr  "GO:0005622" "GO:0005737" "GO:0043229" "GO:0043231" ...
 $ source               : chr  "GO:CC" "GO:CC" "GO:CC" "GO:CC" ...
 $ term_name            : chr  "intracellular anatomical structure" "cytoplasm" "intracellular organelle" "intracellular membrane-bounded organelle" ...
 $ effective_domain_size: int  18964 18964 18964 18964 18964 18964 18679 18964 18964 18964 ...
 $ source_order         : int  284 385 2130 2132 2127 2128 1879 2134 1446 2720 ...

The subsequent execution of

  DT::datatable(goResults,
          extensions = c('Buttons', 'FixedColumns', 'Scroller'),
          options = list(fixedColumns = TRUE,
                         scrollY = 400,
                         scrollX = TRUE,
                         scroller = TRUE,
                         dom = 'Bfrtip',
                         buttons = c('colvis', 'copy', 'print', 'csv','excel', 'pdf')
                         ),
          filter = 'bottom'
          )

Brings the table up in the browser. This is now a bit unexpected, admittedly. To get there, I had to activate display forwarding and I set the BROWSER environment variable. Will rerun the report generation from within snakemake with this setting, but I truly hope you have some idea for me where to look.

Many thanks in advance!

smoe commented 2 years ago

I have inverted the order of the enrichment with the boxplots (both not showing) but that had no effect. For one of my analyses the GO analysis has not found any terms, and this was shown in the box. So, this was executed after all, as it seems. I'll move this further up and report.

borauyar commented 2 years ago

In my experience this occurs with plots/tables with interactive html widgets. So, this could be an issue with some dependency being missing/outdated. Do you think your package matches everything if you installed it via guix?

borauyar commented 2 years ago

Do you run into this issue for all experiments you run or is it just this one that it fails to print the DataTable?

smoe commented 2 years ago

Do you run into this issue for all experiments you run or is it just this one that it fails to print the DataTable?

It is only one ~90 samples large experiment that I am running, but the described problem affects all of the ~5 analyses.

DataTable is also run at the very top of the report. And there it shows just fine. I'll continue experimenting a bit more and if you do not object send you a PM with a direct link to the report. As I mentioned, the direct invocation works fine.

Concerning my installation as a Debian package I do have some doubts. How could I not. Consequently, I am now running the guix install

$ which pigx-rnaseq
/mariner/home/moeller/.guix-profile/bin/pigx-rnaseq

so I can learn about the differences. I just wish the JavaScript would not be embedded. I am not sure about what the "--selfContained" flag may have on this - or where I could specify that. It would need to be a parameter in the settings, right?

Anyway - tomorrow I know more.

rekado commented 2 years ago

I just wish the JavaScript would not be embedded. I am not sure about what the "--selfContained" flag may have on this - or where I could specify that.

Frankly, I'm also not a fan of this behavior. It has one advantage (it's easy to share the report) and several downsides. The selfContained option also results in inefficient base64-encoding of images and data, because there's no simpler way to embed arbitrary data in HTML. This leads to unusually long HTML parse times in Firefox/icecat and probably other browsers as well. This is especially annoying as the browser can only begin to execute asynchronous JavaScript after a complete parse of the full document.

MHTML sounds like the appropriate solution, but ... Firefox does not support it.

So... I guess we should add a way to override the format of the report: single-file HTML, multiple files, and MHTML.

borauyar commented 2 years ago

@smoe if you have this issue in all html reports, then it must be about some library version. I also have experienced this behaviour that when you change the order of the widgets, that affects if they will be displayed. Unfortunately this error is hard to reproduce and usually goes away when you update the libraries.

It is also possible to render the reports using the self_contained argument in rmarkdown::render(output_options = list(self_contained = FALSE ... . This happens in the script runDeseqReport but it is not currently configurable by the user.

smoe commented 2 years ago

@smoe if you have this issue in all html reports, then it must be about some library version. I also have experienced this behaviour that when you change the order of the widgets, that affects if they will be displayed. Unfortunately this error is hard to reproduce and usually goes away when you update the libraries.

R package-wise I am at the very latest that there is. JavaScript-wise I need to check what Debian packages kept the original JavaScript libraries in place and which have symbolically linked to a central instance of the same major version. I have now the results from my run with guix and this worked. A job with salmon did run the first time but was successful when I restarted it. Otherwise - all ran and the reports are inspectable. The second analysis crashed my browser tab but at least everything was executed.

It is also possible to render the reports using the self_contained argument in rmarkdown::render(output_options = list(self_contained = FALSE ... . This happens in the script runDeseqReport but it is not currently configurable by the user.

This would be fairly straight-forward to add as a general parameter in the settings rather than for individual analyses? Or maybe allow an override for individual analyses? Patch? I hope that this will be helpful for my investigation of JavaScript library dependencies and maybe this even helps stability in low-memory situations when tabs can share files.

borauyar commented 2 years ago

I have now the results from my run with guix and this worked.

Great!

This would be fairly straight-forward to add as a general parameter in the settings rather than for individual analyses? Or maybe allow an override for individual analyses? Patch?

Yes, I think it should be a general parameter in the settings file to keep it simple. It would be great if you'd like to provide a patch.