DIA-NN - Githubissues

fmicompbio / einprot

Proteomics analysis workflows

https://fmicompbio.github.io/einprot/

Other

7 stars 0 forks source link

DIA-NN #10

Closed vcoyne1 closed 4 months ago

vcoyne1 commented 1 year ago

Hi Charlotte,

Einprot looks amazing. Will you be extending einprot for use with DIA-NN output?

Thanks, Vernon

csoneson commented 1 year ago

Hi Vernon, thank you! I would say yes, eventually, but I don't have a clear estimate of the timeline. We haven't prioritized it in this phase since it's not one of our main setups at the moment, but it's definitely on our radar. I will leave this issue open, and will discuss with my colleagues (after the holiday period) to see what can be a reasonable plan.

tobiasko commented 1 year ago

Hi @vcoyne1,

I could provide a script that generates a summarizedExperiment container from a DIA-NN main report. But this is of course only a step in the complete einprot workflow. And usually, einprot uses the importExperiment() function to setup a container of the class SingleCellExperiment which is an extension of SummarizedExperiment. So some functionality of einprot will most likely fail on the parent class.

csoneson commented 1 year ago

Hi @tobiasko - if you have such a script set up, and ideally also a small example data set to test it on, would you be willing to contribute it to einprot (you'll of course be added as a contributor to the package)? I could build it into importExperiment(), which would be the first step towards creating a DIA-NN workflow. As we don't work much with DIA data currently, it would be great to collaborate on this. Let me know what you think!

tobiasko commented 11 months ago

Sure! What is a small example data set? I have a lot of DIA-NN results generated for this benchmark data set:

https://www.nature.com/articles/s41597-022-01216-6

or to be more precise for the HF-X (staggered DIA) and the timsTof Pro (diaPASEF) subset.

csoneson commented 11 months ago

Cool, thanks @tobiasko! Ideally I'm looking for a data set that we can include and distribute with the package (and use for unit tests and examples). What is the size of the relevant output files for this data set?

tobiasko commented 11 months ago

Ohhh, the main report of DIA-NN gets big quite rapidly. The one for the DIA subset measured on the HF-X is ~1.2 Gig. :-) It contains 1'553'159 lines. But one could filter for a set of PC that belong to a specific PG and/or run to reduce the size. The table is long formated and PC-centric.

csoneson commented 11 months ago

Oh, I see - if you have a way of sharing that file, I'll take a look. And yes, probably in the end cutting down to a subset of the PCs will be the way to go (it would be good to keep a few replicates for the different conditions to be able to do comparisons). Thanks!

tobiasko commented 11 months ago

Sure. You could go here and select one of the subfolders (dia or diaPASEF) for a specific instrument/data analysis scheme and than further down to out-<date> for a specific DIA-NN output folder. For example dia on the HF-X. The main report of DIA-NN is diann-output.tsv

csoneson commented 10 months ago

Hi all, and sorry for the delay here. I have added preliminary support for DIA-NN output to einprot - you can get the updated version from the diann branch:

remotes::install_github("fmicompbio/einprot", ref = "diann")

The relevant function would be runDIANNAnalysis().

I still consider this a work-in-progress, and more testing is needed. However, if you would like to give it a test run, I'd appreciate any feedback!

@tobiasko - I have used some of the data you linked to above as test data and added you as a contributor in the package DESCRIPTION file. I hope that works - let me know if there's something I should change. Thanks!

tobiasko commented 10 months ago

Hi @csoneson, nice! You are fast. Sure happy to test and thanks for adding me.

tobiasko commented 7 months ago

Hi @csoneson,

i tried running your example code

if (interactive()) {
    sampleAnnot <- read.delim(
        system.file("extdata/diann_example/PXD028735_sampleAnnot.txt",
                    package = "einprot"))
    ## Basic analysis
    out <- runDIANNAnalysis(
        outputDir = tempdir(),
        outputBaseName = "DIANN_LFQ_basic",
        species = "human",
        diannFile = system.file("extdata/diann_example/PXD028735.pg_matrix.tsv",
                                package = "einprot"),
        diannFileType = "pg_matrix",
        outLevel = "pg",
        diannLogFile = system.file("extdata/diann_example/diann-output.log.txt",
                                   package = "einprot"),
        sampleAnnot = sampleAnnot,
        includeFeatureCollections = "complexes",
        stringIdCol = NULL
    )
    ## Output file
    out
}

but it seems you forgot to def. an argument in your code

Error in .checkArgumentsDIANN(templateRmd = templateRmd, outputDir = outputDir, : 
argument "aName" is missing, with no default

I wonder how the code could pass package tests this way.

I added the argument and got another error:

> out <- runDIANNAnalysis(
+   outputDir = tempdir(),
+   outputBaseName = "DIANN_LFQ_basic",
+   species = "human",
+   diannFile = system.file("extdata/diann_example/PXD028735.pg_matrix.tsv",
+                           package = "einprot"),
+   diannFileType = "pg_matrix",
+   outLevel = "pg",
+   diannLogFile = system.file("extdata/diann_example/diann-output.log.txt",
+                              package = "einprot"),
+   sampleAnnot = sampleAnnot,
+   includeFeatureCollections = "complexes",
+   stringIdCol = NULL,
+   aName = "pg_matrix"
+ )

processing file: DIANN_LFQ_basic.Rmd
1/125                                     
2/125 [config]                            
3/125                                     
4/125 [setup]                             
5/125                                     
6/125 [load-pkg]                          
7/125                                     
8/125 [get-basic-info]                    
9/125                                     
10/125 [exp-table]                         
11/125                                     
12/125 [diann-table]                       
13/125                                     
14/125 [diann-cmd]                         
15/125                                     
16/125 [settings-table]                    
17/125                                     
18/125 [read-data]                         
19/125                                     
20/125 [col-list]                          
21/125                                     
22/125 [add-sampleinfo]                    
23/125                                     
24/125 [define-assaynames]                 
25/125                                     
26/125 [overview-graph]                    
27/125                                     
28/125 [intensity-distribution]            
29/125                                     
30/125 [filter-features]                   
31/125                                     
32/125 [fix-ids]                           

Quitting from lines 458-468 [fix-ids] (DIANN_LFQ_basic.Rmd)
Error:
! All values in 'combineCols' must be one of: Protein.Group, Protein.Ids, Protein.Names, Genes, First.Protein.Description
Backtrace:
 1. einprot::fixFeatureIds(...)
 2. global colFun(SummarizedExperiment::rowData(sce))
 3. einprot::combineIds(df, combineCols = c("Gene.names", "Majority.protein.IDs"))
 4. einprot:::.assertVector(...)
Execution halted
Error: Failed to run 'rmarkdown::render' in a new R session.

Session info

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 14.2.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] einprot_0.8.0

loaded via a namespace (and not attached):
  [1] utf8_1.2.3                  shinydashboard_0.7.2        proto_1.0.0                
  [4] gmm_1.7                     tidyselect_1.2.0            RSQLite_2.3.1              
  [7] AnnotationDbi_1.56.2        htmlwidgets_1.6.2           grid_4.1.2                 
 [10] BiocParallel_1.28.3         norm_1.0-11.0               munsell_0.5.0              
 [13] ScaledMatrix_1.2.0          codetools_0.2-19            chron_2.3-61               
 [16] DT_0.28                     miniUI_0.1.1.1              colorspace_2.1-0           
 [19] Biobase_2.54.0              ggalt_0.4.0                 knitr_1.43                 
 [22] uuid_1.1-0                  proDA_1.8.0                 rstudioapi_0.14            
 [25] stats4_4.1.2                SingleCellExperiment_1.16.0 robustbase_0.95-1          
 [28] shinyWidgets_0.7.6          Rttf2pt1_1.3.12             MatrixGenerics_1.6.0       
 [31] GenomeInfoDbData_1.2.7      bit64_4.0.5                 vctrs_0.6.2                
 [34] generics_0.1.3              xfun_0.39                   ggseqlogo_0.1              
 [37] R6_2.5.1                    doParallel_1.0.17           GenomeInfoDb_1.30.1        
 [40] ggbeeswarm_0.7.2            clue_0.3-64                 rsvd_1.0.5                 
 [43] msigdbr_7.5.1               MsCoreUtils_1.6.2           iSEEu_1.6.0                
 [46] ggiraph_0.8.7               AnnotationFilter_1.18.0     bitops_1.0-7               
 [49] cachem_1.0.8                reshape_0.8.9               shinyAce_0.4.2             
 [52] DelayedArray_0.20.0         promises_1.2.0.1            scales_1.2.1               
 [55] beeswarm_0.4.0              gtable_0.3.3                beachmat_2.10.0            
 [58] ash_1.0-15                  sandwich_3.0-2              rrcovNA_0.4-15             
 [61] rlang_1.1.1                 genefilter_1.76.0           systemfonts_1.0.4          
 [64] GlobalOptions_0.1.2         splines_4.1.2               extrafontdb_1.0            
 [67] lazyeval_0.2.2              impute_1.68.0               httpuv_1.6.11              
 [70] extrafont_0.19              tools_4.1.2                 ggplot2_3.4.2              
 [73] ellipsis_0.3.2              gplots_3.1.3                kableExtra_1.3.4           
 [76] jquerylib_0.1.4             RColorBrewer_1.1-3          BiocGenerics_0.40.0        
 [79] STRINGdb_2.6.5              MultiAssayExperiment_1.20.0 gsubfn_0.7                 
 [82] Rcpp_1.0.10                 hash_2.2.6.2                plyr_1.8.8                 
 [85] sparseMatrixStats_1.6.0     zlibbioc_1.40.0             purrr_1.0.1                
 [88] RCurl_1.98-1.12             sqldf_0.4-11                viridis_0.6.3              
 [91] GetoptLong_1.0.5            cowplot_1.1.1               ExploreModelMatrix_1.6.0   
 [94] S4Vectors_0.32.4            zoo_1.8-12                  SummarizedExperiment_1.24.0
 [97] ggrepel_0.9.3               cluster_2.1.4               ComplexUpset_1.3.3         
[100] magrittr_2.0.3              data.table_1.14.8           circlize_0.4.15            
[103] colourpicker_1.2.0          pcaMethods_1.86.0           mvtnorm_1.1-3              
[106] ProtGenerics_1.26.0         matrixStats_0.63.0          hms_1.1.3                  
[109] patchwork_1.1.3             shinyjs_2.1.0               mime_0.12                  
[112] evaluate_0.21               xtable_1.8-4                XML_3.99-0.14              
[115] mclust_6.0.0                gridExtra_2.3               IRanges_2.28.0             
[118] shape_1.4.6                 scater_1.22.0               compiler_4.1.2             
[121] tibble_3.2.1                maps_3.4.1                  writexl_1.4.2              
[124] KernSmooth_2.23-21          crayon_1.5.2                htmltools_0.5.5            
[127] pcaPP_2.0-3                 tzdb_0.4.0                  mgcv_1.8-42                
[130] later_1.3.1                 rrcov_1.7-2                 tidyr_1.3.0                
[133] DBI_1.1.3                   proj4_1.0-12                ComplexHeatmap_2.10.0      
[136] MASS_7.3-60                 tmvtnorm_1.5                babelgene_22.9             
[139] readr_2.1.4                 Matrix_1.5-4.1              cli_3.6.1                  
[142] imputeLCMD_2.1              parallel_4.1.2              igraph_1.4.3               
[145] GenomicRanges_1.46.1        forcats_1.0.0               pkgconfig_2.0.3            
[148] scuttle_1.4.0               plotly_4.10.2               xml2_1.3.4                 
[151] foreach_1.5.2               svglite_2.1.1               annotate_1.72.0            
[154] vipor_0.4.5                 bslib_0.4.2                 stringdist_0.9.10          
[157] webshot_0.5.5               XVector_0.34.0              rvest_1.0.3                
[160] stringr_1.5.0               digest_0.6.31               Biostrings_2.62.0          
[163] rmarkdown_2.21              rintrojs_0.3.2              DelayedMatrixStats_1.16.0  
[166] shiny_1.7.4                 gtools_3.9.4                rjson_0.2.21               
[169] lifecycle_1.0.3             nlme_3.1-162                jsonlite_1.8.4             
[172] BiocNeighbors_1.12.0        QFeatures_1.4.0             iSEE_2.6.0                 
[175] viridisLite_0.4.2           limma_3.50.3                fansi_1.0.4                
[178] pillar_1.9.0                lattice_0.21-8              GGally_2.1.2               
[181] DEoptimR_1.1-1              KEGGREST_1.34.0             fastmap_1.1.1              
[184] httr_1.4.6                  plotrix_3.8-2               survival_3.5-5             
[187] glue_1.6.2                  png_0.1-8                   iterators_1.0.14           
[190] bit_4.0.5                   stringi_1.7.12              sass_0.4.6                 
[193] blob_1.2.4                  BiocSingular_1.10.0         caTools_1.18.2             
[196] memoise_2.0.1               dplyr_1.1.2                 irlba_2.3.5.1

tobiasko commented 7 months ago

Could it be the example code defaults to the wrong .rmd template file? Which templateRmd should it use?

seebajan commented 7 months ago

Hello Tobias,

I am using:

Desired name for the main assay (if diannFileType is pg_matrix or pr_matrix), or the column to use for the main assay (if diannFileType is main_report)

aName <- "MaxLFQ"

tobiasko commented 7 months ago

Hi @seebajan,

the input file in the example code is a pg matrix file...not the DIA-NN main report.

inst/extdata/diann_example/PXD028735.pg_matrix.tsv

tobiasko commented 7 months ago

Not really sure what @csoneson idea was in case of the matrix files, since matrix files always only contain a single response value (no need to select a column). But without the aName argument the function is not happy.

seebajan commented 7 months ago

That's what I was using, too, for a successful run: report.pg_matrix.tsv

I do have this line with the argument in my latest run script templates, but I guess it is only used to describe the assays in the report, not for column selection as in MQ or PD, which is done here by recognizing the file paths.

It's still a work in progress... Sorry, let's wait for @csoneson to comment.

tobiasko commented 7 months ago

hmm, ok. I tried aName = "MaxLFQ" and aName = "aName" put it always fails at setp 32/125 [fix-ids]

csoneson commented 7 months ago

Hi @tobiasko - sorry for the delay in responding, I was out of office yesterday. Indeed, for the matrix files the aName argument is used to define the name of the main assay (as einprot can not infer what the values in the matrix files correspond to, the user has to provide this information). And apologies for the issues with the example; I had missed to update that code. I just pushed a fix to the diann branch. Thanks a lot for testing!

tobiasko commented 7 months ago

Hi @csoneson,

perfect! package version 0.8.2 runs like a charm on the DIA-NN example data! A more general question: What is your idea with respect to the different diannFile, diannFileType options?

Example: If you provide a matrix file of the pg type einprot expects to find a matching main report at the same location? The prefix of the file names (matrices and main report) have to match? Which information coming from which file is used how?

Can I also provide just the main report and def. the output level by setting outLevel and aName? In that case einprot calculates a new protein assay matrix?

Are genes also supported as features? I did not see the respective matrix file in the example data. But according to the log file it was written to disc, so it looks like you decided not to incl. it in the package .../inst/.... I am wondering if gene feature level would be useful having data integration with short read sequencing in mind?

csoneson commented 7 months ago

Great!

At the moment, einprot will only use one of the files - either the matrix file or the main report. So it doesn't look for additional files in the same folder, and doesn't try to combine the information present in the different files. I believe there are different filtering applied to the different file types, so I'm basically asking the user to decide which information to use.

Yes, if you just provide the main report and defines the output level and aName, it will create a new protein assay matrix (the main report + precursor level is not yet implemented, on the todo list).

I did not add support for the gene matrix (still trying to figure out how people tend to use the output in practice, so this feedback is super helpful), but that's certainly possible.

tobiasko commented 7 months ago

ok. Tried to switch to the main report but got this:

> ## from main report
> ## using PG.Quantity column as assay
> out <- runDIANNAnalysis(
+   outputDir = tempdir("/Users/tobiasko/Documents/RStudio/einprot/tmp/"),
+   outputBaseName = "DIANN_LFQ_basic",
+   species = "human",
+   diannFile = system.file("extdata/diann_example/PXD028735.report.tsv",
+                           package = "einprot"),
+   diannFileType = "main_report",
+   outLevel = "pg",
+   diannLogFile = system.file("extdata/diann_example/diann-output.log.txt",
+                              package = "einprot"),
+   sampleAnnot = sampleAnnot,
+   includeFeatureCollections = "complexes",
+   stringIdCol = NULL,
+   aName = "PG.Quantity",
+   idCol = function(df) combineIds(df, combineCols = c("Genes", "Protein.Ids")),
+   labelCol = function(df) getFirstId(df, colName = "Protein.Names"),
+   geneIdCol = function(df) getFirstId(df, colName = "Genes"),
+   proteinIdCol = "Protein.Ids",
+   forceOverwrite = TRUE
+ )
/var/folders/j_/4fgphvp14tlf9ms5jgs503qh0000gn/T//RtmpdEdHqh/DIANN_LFQ_basic.Rmd already exists but forceOverwrite = TRUE, overwriting.

processing file: DIANN_LFQ_basic.Rmd
1/125                                     
2/125 [config]                            
3/125                                     
4/125 [setup]                             
5/125                                     
6/125 [load-pkg]                          
7/125                                     
8/125 [get-basic-info]                    
9/125                                     
10/125 [exp-table]                         
11/125                                     
12/125 [diann-table]                       
13/125                                     
14/125 [diann-cmd]                         
15/125                                     
16/125 [settings-table]                    
17/125                                     
18/125 [read-data]                         
19/125                                     
20/125 [col-list]                          
21/125                                     
22/125 [add-sampleinfo]                    

Quitting from lines 376-380 [add-sampleinfo] (DIANN_LFQ_basic.Rmd)
Error in `addSampleAnnots()`:
! 'sce' already have column(s) named sample
Backtrace:
 1. einprot::addSampleAnnots(sce, sampleAnnot = sampleAnnot)
Execution halted
Error: Failed to run 'rmarkdown::render' in a new R session.

csoneson commented 7 months ago

Sorry about that, I have pushed a fix to the diann branch. However, there's another issue here in that also the idCol, labelCol, geneIdCol and proteinIdCol arguments need to be redefined, as they can only use columns that are available in the rowData of the SingleCellExperiment. In this case, there's only one column (Protein.Group), as there is no other column which has the same value for a given protein group across all samples (i.e., even for the same protein group, there are sometimes different protein IDs assigned for different samples, so this is not added as a feature annotation (it is there as an assay instead). This is a bit annoying and perhaps one needs to find a workaround - e.g., create an annotation column with the union of all protein IDs (or gene IDs, etc) that are assigned to a given protein group across samples.

For now, to get it to run through, you can set all these four variables to Protein.Group.

tobiasko commented 7 months ago

Nice! einprot v0.8.3 fixed the problem. I would suggest to add example code running from the main report to the runDIANNAnalysis {einprot} documentation.

Restarting R session...

> library(einprot)
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
Registered S3 methods overwritten by 'ggalt':
  method                  from   
  grid.draw.absoluteGrob  ggplot2
  grobHeight.absoluteGrob ggplot2
  grobWidth.absoluteGrob  ggplot2
  grobX.absoluteGrob      ggplot2
  grobY.absoluteGrob      ggplot2
> if (interactive()) {
+   sampleAnnot <- read.delim(
+     system.file("extdata/diann_example/PXD028735_sampleAnnot.txt",
+                 package = "einprot"))
+ }
> ## from main report
> ## using PG.Quantity column as assay
> out <- runDIANNAnalysis(
+   outputDir = tempdir("/Users/tobiasko/Documents/RStudio/einprot/tmp/fromMainReport/"),
+   outputBaseName = "DIANN_LFQ_basic",
+   species = "human",
+   diannFile = system.file("extdata/diann_example/PXD028735.report.tsv",
+                           package = "einprot"),
+   diannFileType = "main_report",
+   outLevel = "pg",
+   diannLogFile = system.file("extdata/diann_example/diann-output.log.txt",
+                              package = "einprot"),
+   sampleAnnot = sampleAnnot,
+   includeFeatureCollections = "complexes",
+   stringIdCol = NULL,
+   aName = "PG.Quantity",
+   idCol = "Protein.Group",
+   labelCol = "Protein.Group",
+   geneIdCol = "Protein.Group",
+   proteinIdCol = "Protein.Group",
+   forceOverwrite = TRUE
+ )

processing file: DIANN_LFQ_basic.Rmd
1/125                                     
2/125 [config]                            
3/125                                     
4/125 [setup]                             
5/125                                     
6/125 [load-pkg]                          
7/125                                     
8/125 [get-basic-info]                    
9/125                                     
10/125 [exp-table]                         
11/125                                     
12/125 [diann-table]                       
13/125                                     
14/125 [diann-cmd]                         
15/125                                     
16/125 [settings-table]                    
17/125                                     
18/125 [read-data]                         
19/125                                     
20/125 [col-list]                          
21/125                                     
22/125 [add-sampleinfo]                    
23/125                                     
24/125 [define-assaynames]                 
25/125                                     
26/125 [overview-graph]                    
27/125                                     
28/125 [intensity-distribution]            
29/125                                     
30/125 [filter-features]                   
31/125                                     
32/125 [fix-ids]                           
33/125                                     
34/125 [feature-id-overview-1]             
35/125                                     
36/125 [prepare-feature-collections]       
37/125                                     
38/125 [log-transform]                     
39/125                                     
40/125 [missing-values]                    
41/125                                     
42/125 [missing-values-2]                  
43/125                                     
44/125 [missing-values-overall]            
45/125                                     
46/125 [text-norm]                         
47/125                                     
48/125 [normalize]                         
49/125                                     
50/125 [plot-imputation]                   
51/125                                     
52/125 [intensity-distribution-imputed]    
53/125                                     
54/125 [remove-batch-effect]               
55/125                                     
56/125 [text-test]                         
57/125                                     
58/125 [set-test-assay]                    
59/125                                     
60/125 [initialize-tests]                  
61/125                                     
62/125 [list-of-comparisons]               
63/125                                     
64/125 [list-of-group-compositions]        
65/125                                     
66/125 [no-test]                           
67/125                                     
68/125 [def-aval]                          
69/125                                     
70/125 [run-test]                          
71/125                                     
72/125 [text-expdesign]                    
73/125                                     
74/125 [get-fig-height]                    
75/125                                     
76/125 [expdesign-plot]                    
77/125                                     
78/125 [test-messages]                     
79/125                                     
80/125 [text-sa]                           
81/125                                     
82/125 [plot-sa]                           
83/125                                     
84/125 [volcano-plot]                      
85/125                                     
86/125 [interactive-volcanos-text]         
87/125                                     
88/125 [interactive-volcanos]              
89/125                                     
90/125 [export-testres]                    
91/125                                     
92/125 [merge-tests]                       
93/125                                     
94/125 [upset-tests]                       
95/125                                     
96/125 [top-feature-sets]                  
97/125                                     
98/125 [linktable]                         
99/125                                     
100/125 [make-sce]                          
101/125                                     
102/125 [get-pca-features]                  
103/125                                     
104/125 [run-pca]                           
105/125                                     
106/125 [interactive-pcas]                  
107/125                                     
108/125 [heatmap]                           
109/125                                     
110/125 [save-heatmap]                      
111/125                                     
112/125 [corrplot]                          
113/125                                     
114/125 [save-sce]                          
115/125                                     
116/125 [save-rowdata]                      
117/125                                     
118/125 [isee-script-path]                  
119/125                                     
120/125 [create-markdown-chunks-dynamically]
121/125                                     
1/2              
2/2 [source-isee]
122/125 [make-isee-script]                  
123/125                                     
124/125 [session-info]                      
125/125                                     
output file: DIANN_LFQ_basic.knit.md

/usr/local/bin/pandoc +RTS -K512m -RTS DIANN_LFQ_basic.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output /private/var/folders/j_/4fgphvp14tlf9ms5jgs503qh0000gn/T/RtmphxMupi/DIANN_LFQ_basic.html --lua-filter /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 3 --variable toc_float=1 --variable toc_selectors=h1,h2,h3 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=united --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /var/folders/j_/4fgphvp14tlf9ms5jgs503qh0000gn/T//RtmpNyPo7y/rmarkdown-str648b1f3a34cc.html --variable code_folding=hide --variable code_menu=1 --citeproc 

Output created: /private/var/folders/j_/4fgphvp14tlf9ms5jgs503qh0000gn/T/RtmphxMupi/DIANN_LFQ_basic.html

I would also suggest that the inst/extdata/diann_example/README.txt should explain how you reduced the main report (looks like you selected a subset of rows from the original report). Maybe the file name should also indicate this.

csoneson commented 7 months ago

Great! Yes, I'll add documentation also for starting from the main report. And also mention how the report was reduced (this was only done to save space while still providing a minimal file to use for testing). Thanks!

tobiasko commented 7 months ago

Perfect. A question about missing values and imputation. How do you treat missing values in a DIA-NN matrix file/main report by default? Do you convert to NA or to 0 for the assay matrix?

tobiasko commented 7 months ago

I remember that DIA-NN behaves according to

"Sometimes DIA-NN will report a zero as the best estimate for a precursor or protein quantity. Such zero quantities are omitted from protein/gene matrices.

Have you seen indications for omitted zeros in the matrix files? Are those empty cells?

tobiasko commented 7 months ago

Hmmm...looking at inst/extdata/diann_example/PXD028735.pg_matrix.tsv I get the impression that you saved the matrix file to disc after importing with read.delim() or alike. Is that a good choice? Or should one better use the org. file as written by DIA-NN for the package example? I don't see big problems when reading it, but sometimes strange things happen when you import text files into R and a user would also start from the file as written by DIA-NN.

> pg.M <- read_tsv("diann-output.pg_matrix.tsv")
Rows: 12261 Columns: 40                                                                                            
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr  (4): Protein.Group, Protein.Ids, Protein.Names, Genes
dbl (35): /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_A_Sample_...
lgl  (1): First.Protein.Description

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> pg.M
# A tibble: 12,261 × 40
   Protein.Group     Protein.Ids      Protein.Names Genes First.Protein.Descri…¹
   <chr>             <chr>            <chr>         <chr> <lgl>                 
 1 A0A024R1R8;Q9Y2S6 Q9Y2S6;A0A024R1… TMA7B_HUMAN;… TMA7… NA                    
 2 A0A024RBG1        A0A024RBG1       NUD4B_HUMAN   NUDT… NA                    
 3 A0A024RBG1;Q9NZJ9 Q9NZJ9;A0A024RB… NUD4B_HUMAN;… NUDT… NA                    
 4 A0A087WVZ6        A0A087WVZ6;H7C4… A0A087WVZ6_H… ZMYN… NA                    
 5 A0A087WWU8        J3KN67;A0A087WW… A0A087WWU8_H… TPM3  NA                    
 6 A0A087WWU8;Q5VU61 A0A087WWU8;D6R9… A0A087WWU8_H… TPM3  NA                    
 7 A0A087WY03;E5RFV3 A0A087WY03;E5RF… A0A087WY03_H… SREK1 NA                    
 8 A0A087WYV9        Q9P0S2;A0A087WY… A0A087WYV9_H… SYNJ… NA                    
 9 A0A087WZ13        A0A087WZ13;K7EQ… A0A087WZ13_H… RAVE… NA                    
10 A0A087X0J2        A0A087X0J2       A0A087X0J2_H… UBXN8 NA                    
# ℹ 12,251 more rows
# ℹ abbreviated name: ¹First.Protein.Description
# ℹ 35 more variables:
#   `/scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_A_Sample_Beta_01.mzML` <dbl>,
#   `/scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_A_Sample_Gamma_02.mzML` <dbl>,
#   `/scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_01.mzML` <dbl>,
#   `/scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_A_Sample_Gamma_01.mzML` <dbl>, …
# ℹ Use `print(n = ...)` to see more rows

csoneson commented 7 months ago

Yes, all files were saved to disk from R (as they were subset/some samples were excluded, for space reasons). In what sense do you mean you see that the file was saved from R? (the dots in the column name, for example, are there also in the original tsv file from DIA-NN)

csoneson commented 7 months ago

I remember that DIA-NN behaves according to

"Sometimes DIA-NN will report a zero as the best estimate for a precursor or protein quantity. Such zero quantities are omitted from protein/gene matrices.

Have you seen indications for omitted zeros in the matrix files? Are those empty cells?

There are lots of explicit NAs in the matrix files. In addition, einprot will convert zeros in the assay matrices to NAs before the imputation (as the imputation tools assume NAs).

tobiasko commented 7 months ago

In what sense do you mean you see that the file was saved from R?

The org. files use a different notation for floating point numbers. ;-)

tobiasko@fgcz-c-072:/scratch/cpanse/PXD028735/dia/out-2023-07-17$ head diann-output.pg_matrix.tsv
Protein.Group   Protein.Ids Protein.Names   Genes   First.Protein.Description   /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_A_Sample_Beta_01.mzML  /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_A_Sample_Gamma_02.mzML /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_01.mzML /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_A_Sample_Gamma_01.mzML /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_B_Sample_Beta_02.mzML  /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_03.mzML /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_B_Sample_Beta_03.mzML  /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_B_Sample_Beta_01.mzML  /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_03.mzML /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_A_Sample_Beta_03.mzML  /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_B_Sample_Gamma_01.mzML /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_02.mzML /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_A_Sample_Beta_02.mzML  /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_A_Sample_Alpha_01.mzML /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_B_Sample_Alpha_02.mzML /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_A_Sample_Gamma_03.mzML /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_B_Sample_Gamma_02.mzML /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Condition_B_Sample_Gamma_03.mzML /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Ecoli_01.mzML    /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Ecoli_02.mzML    /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Ecoli_03.mzML    /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Human_01.mzML    /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Human_02.mzML    /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Human_03.mzML    /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_QC_02.mzML   /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_QC_03.mzML   /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_QC_04.mzML   /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_QC_06.mzML   /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_QC_05.mzML   /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_QC_07.mzML   /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_QC_08.mzML   /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_QC_09.mzML   /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Yeast_01.mzML    /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Yeast_02.mzML    /scratch/cpanse/PXD028735/dia/LFQ_Orbitrap_AIF_Yeast_03.mzML
A0A024R1R8;Q9Y2S6   Q9Y2S6;A0A024R1R8   TMA7B_HUMAN;TMA7_HUMAN  TMA7;TMA7B      8.37561e+06 7.22609e+06 7.87578e+06 6.86327e+06 7.27397e+06 5.94051e+06 6.25283e+069.26681e+06  5.87949e+06 5.7918e+06  7.09856e+06 7.05454e+06 7.19183e+06 7.62009e+06 7.52275e+06 6.36872e+06 7.38344e+06 6.82802e+06             7.34794e+06 7.78391e+06 7.54777e+06 8.11878e+06 7.27397e+06 6.89277e+06 7.27544e+06 6.98606e+06 6.36723e+06 5.64614e+06 7.02494e+06         
A0A024RBG1  A0A024RBG1  NUD4B_HUMAN NUDT4B      2.15687e+06 1.84034e+06 1.82459e+06 1.95327e+06 1.8607e+06  1.83822e+06 2.02575e+06 1.94479e+06 1.94466e+061.66355e+06  2.06833e+06 2.02022e+06 2.00955e+06 2.07019e+06 2.12453e+06 2.0595e+06  1.91283e+06 2.02393e+06             1.96106e+06 1.85486e+06 1.86048e+06 1.80086e+06 2.15493e+06 2.04208e+06 1.89149e+06 1.95269e+06 1.84243e+06 2.18323e+06 1.95282e+06         
A0A024RBG1;Q9NZJ9   Q9NZJ9;A0A024RBG1   NUD4B_HUMAN;NUDT4_HUMAN NUDT4;NUDT4B        4.87413e+06 5.10983e+06 4.63534e+06 5.09239e+06 4.96421e+06 5.09042e+06 5.05195e+064.16419e+06  5.40388e+06 5.01998e+06 5.03463e+06 4.89566e+06 4.93958e+06 4.65591e+06 4.24324e+06 5.29988e+06 4.76155e+06 4.77928e+06             4.99819e+06 4.90757e+06 4.91601e+06 4.92414e+06 5.16884e+06 4.72441e+06 4.99854e+06 4.77784e+06 5.09337e+06 5.02925e+06 4.64295e+06         
A0A087WVZ6  A0A087WVZ6;H7C4X9   A0A087WVZ6_HUMAN    ZMYND8      819497  921285  923714  1.04065e+06 816877  1.03036e+06 834711  659658  1.0598e+06  805837  933500  898121  752210  779903  965446  572278  788353  606827              896753  874478  897252  914926  954803  802444  884826  730389  1.05695e+06 1.00234e+06 964796          
A0A087WWU8  J3KN67;A0A087WWU8   A0A087WWU8_HUMAN    TPM3        5.93902e+07 6.77154e+07 6.65228e+07 7.1698e+07  5.44017e+07 5.13506e+07 5.70215e+07 5.72441e+075.95156e+07  5.7574e+07  6.66274e+07 7.18365e+07 6.7485e+07  6.02976e+07 6.28058e+07 6.48653e+07 5.86286e+07 6.03974e+07             5.97501e+07 5.91458e+07 5.36738e+07 6.2817e+07  6.61292e+07 6.75777e+07 6.63206e+07 7.15349e+07 6.32826e+07 5.73252e+07 6.28957e+07         
A0A087WWU8;Q5VU61   A0A087WWU8;D6R904;Q5VU61    A0A087WWU8_HUMAN;Q5VU61_HUMAN   TPM3        7.01292e+07 4.3179e+07  6.23118e+07 7.76636e+07 4.87999e+07 5.60225e+07 7.29224e+07 5.67732e+07 5.84106e+07 7.30778e+07 8.63609e+07 9.44699e+07 7.07944e+07 5.89117e+07 1.01801e+08 6.01809e+07 6.73196e+07 6.85591e+07         2.69655e+06 7.91462e+07 6.99861e+07 4.84764e+07 7.17575e+07 8.10464e+07 9.84767e+07 5.50087e+07 7.84295e+07 6.76122e+07 6.5322e+07  8.09146e+07         
A0A087WY03;E5RFV3   A0A087WY03;E5RFV3   A0A087WY03_HUMAN;E5RFV3_HUMAN   SREK1       5.97267e+06 5.70849e+06 5.50558e+06 5.71626e+06 5.75542e+06 6.75438e+06 5.96709e+066.10118e+06  6.13781e+06 6.00699e+06 6.40692e+06 5.49213e+06 5.58039e+06 5.55605e+06 5.89062e+06 5.97818e+06 5.63914e+06 6.20536e+06             5.70714e+06 5.60514e+06 5.78004e+06 5.67721e+06 5.81995e+06 5.92453e+06 5.6493e+06  5.45996e+06 6.16663e+06 6.17911e+06 5.76912e+06         
A0A087WYV9  Q9P0S2;A0A087WYV9;A0A087WX56;A0A087X1F5 A0A087WYV9_HUMAN    SYNJ2BP-COX16           301604  549473  524980  339271      452459      286707  420098      190654  448335      360070  245350  339994  510396              399494  367440  512738  197561      395113  385855  370178      339586  479131          
A0A087WZ13  A0A087WZ13;K7EQG2   A0A087WZ13_HUMAN    RAVER1      4.35222e+06 4.35849e+06 4.65029e+06 4.68265e+06 4.28878e+06 4.35573e+06 4.2199e+06  4.34144e+064.30976e+06  4.57008e+06 4.44751e+06 4.85062e+06 4.61482e+06 4.43328e+06 4.50448e+06 4.75549e+06 4.32511e+06 4.36587e+06             4.36976e+06 4.39702e+06 4.50478e+06 3.89641e+06 4.37329e+06 4.82144e+06 4.54084e+06 4.67103e+06 4.80585e+06 4.33529e+06 4.65282e+06

tobiasko commented 7 months ago

But why would you like to impute a 0? It's not missing, it's just a zero.

csoneson commented 7 months ago

But why would you like to impute a 0? It's not missing, it's just a zero.

Depends on the tool - in the MaxQuant files, missing values are encoded as zeros rather than NAs

tobiasko commented 7 months ago

I tested on the complete data in fgcz-c-072:/scratch/cpanse/PXD028735/dia/out-2023-07-17. Works fine. 😄🥳👍 The only thing that is maybe not super convenient (or would need some explanation in the docu) is the fact that diannfile and diannLogFile are expected as absolut path (I copied the data to my local wd and just using the file name did not work)

> out <- runDIANNAnalysis(
+   outputDir = tempdir(),
+   outputBaseName = "DIANN_LFQ_basic",
+   species = "human",
+   #diannFile = "diann-output.pg_matrix.tsv",
+   diannFile = "/Users/tobiasko/Documents/RStudio/einprot/diann-output.pg_matrix.tsv",
+   diannFileType = "pg_matrix",
+   outLevel = "pg",
+   #diannLogFile = "diann-output.log.txt",
+   diannLogFile ="/Users/tobiasko/Documents/RStudio/einprot/diann-output.log.txt",
+   sampleAnnot = fullSampleAnnot,
+   includeFeatureCollections = "complexes",
+   stringIdCol = NULL,
+   aName = "MaxLFQ",
+   idCol = function(df) combineIds(df, combineCols = c("Genes", "Protein.Ids")),
+   labelCol = function(df) getFirstId(df, colName = "Protein.Names"),
+   geneIdCol = function(df) getFirstId(df, colName = "Genes"),
+   proteinIdCol = "Protein.Ids",
+   forceOverwrite = TRUE
+ )
/var/folders/j_/4fgphvp14tlf9ms5jgs503qh0000gn/T//RtmphxMupi/DIANN_LFQ_basic.Rmd already exists but forceOverwrite = TRUE, overwriting.

processing file: DIANN_LFQ_basic.Rmd
1/125                                     
2/125 [config]                            
3/125                                     
4/125 [setup]                             
5/125                                     
6/125 [load-pkg]                          
7/125                                     
8/125 [get-basic-info]                    
9/125                                     
10/125 [exp-table]                         
11/125                                     
12/125 [diann-table]                       
13/125                                     
14/125 [diann-cmd]                         
15/125                                     
16/125 [settings-table]                    
17/125                                     
18/125 [read-data]                         
19/125                                     
20/125 [col-list]                          
21/125                                     
22/125 [add-sampleinfo]                    
23/125                                     
24/125 [define-assaynames]                 
25/125                                     
26/125 [overview-graph]                    
27/125                                     
28/125 [intensity-distribution]            
29/125                                     
30/125 [filter-features]                   
31/125                                     
32/125 [fix-ids]                           
33/125                                     
34/125 [feature-id-overview-1]             
35/125                                     
36/125 [prepare-feature-collections]       
37/125                                     
38/125 [log-transform]                     
39/125                                     
40/125 [missing-values]                    
41/125                                     
42/125 [missing-values-2]                  
43/125                                     
44/125 [missing-values-overall]            
45/125                                     
46/125 [text-norm]                         
47/125                                     
48/125 [normalize]                         
49/125                                     
50/125 [plot-imputation]                   
51/125                                     
52/125 [intensity-distribution-imputed]    
53/125                                     
54/125 [remove-batch-effect]               
55/125                                     
56/125 [text-test]                         
57/125                                     
58/125 [set-test-assay]                    
59/125                                     
60/125 [initialize-tests]                  
61/125                                     
62/125 [list-of-comparisons]               
63/125                                     
64/125 [list-of-group-compositions]        
65/125                                     
66/125 [no-test]                           
67/125                                     
68/125 [def-aval]                          
69/125                                     
70/125 [run-test]                          
71/125                                     
72/125 [text-expdesign]                    
73/125                                     
74/125 [get-fig-height]                    
75/125                                     
76/125 [expdesign-plot]                    
77/125                                     
78/125 [test-messages]                     
79/125                                     
80/125 [text-sa]                           
81/125                                     
82/125 [plot-sa]                           
83/125                                     
84/125 [volcano-plot]                      
85/125                                     
86/125 [interactive-volcanos-text]         
87/125                                     
88/125 [interactive-volcanos]              
89/125                                     
90/125 [export-testres]                    
91/125                                     
92/125 [merge-tests]                       
93/125                                     
94/125 [upset-tests]                       
95/125                                     
96/125 [top-feature-sets]                  
97/125                                     
98/125 [linktable]                         
99/125                                     
100/125 [make-sce]                          
101/125                                     
102/125 [get-pca-features]                  
103/125                                     
104/125 [run-pca]                           
105/125                                     
106/125 [interactive-pcas]                  
107/125                                     
108/125 [heatmap]                           
109/125                                     
110/125 [save-heatmap]                      
111/125                                     
112/125 [corrplot]                          
113/125                                     
114/125 [save-sce]                          
115/125                                     
116/125 [save-rowdata]                      
117/125                                     
118/125 [isee-script-path]                  
119/125                                     
120/125 [create-markdown-chunks-dynamically]
121/125                                     
1/2              
2/2 [source-isee]
122/125 [make-isee-script]                  
123/125                                     
124/125 [session-info]                      
125/125                                     
output file: DIANN_LFQ_basic.knit.md

/usr/local/bin/pandoc +RTS -K512m -RTS DIANN_LFQ_basic.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output /private/var/folders/j_/4fgphvp14tlf9ms5jgs503qh0000gn/T/RtmphxMupi/DIANN_LFQ_basic.html --lua-filter /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 3 --variable toc_float=1 --variable toc_selectors=h1,h2,h3 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=united --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /var/folders/j_/4fgphvp14tlf9ms5jgs503qh0000gn/T//RtmpuSC3FK/rmarkdown-str8dfb18028cad.html --variable code_folding=hide --variable code_menu=1 --citeproc 

Output created: /private/var/folders/j_/4fgphvp14tlf9ms5jgs503qh0000gn/T/RtmphxMupi/DIANN_LFQ_basic.html

tobiasko commented 7 months ago

Should I dare to run from the 1.2 Gig main report? 😄 But maybe I should than switch to the our Linux cluster node and run in tmux 😂

csoneson commented 7 months ago

Great, thank you for testing and all the feedback!

The only thing that is maybe not super convenient (or would need some explanation in the docu) is the fact that diannfile and diannLogFile are expected as absolut path

I will look into this, I agree that relative paths should work as well.

Should I dare to run from the 1.2 Gig main report? 😄 But maybe I should than switch to the our Linux cluster node and run in tmux 😂

I did not try this yet on my side :) but yeah, it's getting close to the end of the day ;)

tobiasko commented 7 months ago

Hi @csoneson

I failed installing einprot on our cluster node running Debian

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
ERROR: dependencies ‘ggalt’, ‘motifStack’ are not available for package ‘einprot’
* removing ‘/usr/local/lib/R/site-library/einprot’
Warning messages:
1: In i.p(...) :
  installation of package ‘DirichletMultinomial’ had non-zero exit status
2: In i.p(...) : installation of package ‘proj4’ had non-zero exit status
3: In i.p(...) : installation of package ‘ggalt’ had non-zero exit status
4: In i.p(...) :
  installation of package ‘TFBSTools’ had non-zero exit status
5: In i.p(...) :
  installation of package ‘motifStack’ had non-zero exit status
6: In i.p(...) :
  installation of package ‘/tmp/Rtmppr30GK/file19918c72ad2202/einprot_0.8.3.tar.gz’ had non-zero exit status

Looks like there are no precompiled versions of some packages available and installation from source fails due to missing dependencies

> install.packages("ggalt")
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
also installing the dependency ‘proj4’

trying URL 'https://cloud.r-project.org/src/contrib/proj4_1.0-14.tar.gz'
Content type 'application/x-gzip' length 43038 bytes (42 KB)
==================================================
downloaded 42 KB

trying URL 'https://cloud.r-project.org/src/contrib/ggalt_0.4.0.tar.gz'
Content type 'application/x-gzip' length 2155519 bytes (2.1 MB)
==================================================
downloaded 2.1 MB

* installing *source* package ‘proj4’ ...
** package ‘proj4’ successfully unpacked and MD5 sums checked
** using staged installation
checking whether pkg-config knows about proj... no
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether the compiler supports GNU C++... yes
checking whether g++ -std=gnu++17 accepts -g... yes
checking for g++ -std=gnu++17 option to enable C++11 features... none needed
checking for stdio.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for strings.h... yes
checking for sys/stat.h... yes
checking for sys/types.h... yes
checking for unistd.h... yes
checking for proj_api.h... no
checking with ACCEPT_USE_OF_DEPRECATED_PROJ_API_H... 
checking for proj_api.h... no
checking for proj.h... no
checking for proj_get_source_crs in -lproj... no
configure: Retrying with pkg-config --static
Package proj was not found in the pkg-config search path.
Perhaps you should add the directory containing `proj.pc'
to the PKG_CONFIG_PATH environment variable
No package 'proj' found
checking for proj_get_source_crs in -lproj... no
configure: PROJ4 API available: no
configure: PROJ6 API available: no
checking whether to require PROJ6 API... no
configure: error: Cannot find working proj.h headers and library.
*** You may need to install libproj-dev or similar! ***

ERROR: configuration failed for package ‘proj4’
* removing ‘/usr/local/lib/R/site-library/proj4’
ERROR: dependency ‘proj4’ is not available for package ‘ggalt’
* removing ‘/usr/local/lib/R/site-library/ggalt’

The downloaded source packages are in
        ‘/tmp/Rtmppr30GK/downloaded_packages’
Warning messages:
1: In install.packages("ggalt") :
  installation of package ‘proj4’ had non-zero exit status
2: In install.packages("ggalt") :
  installation of package ‘ggalt’ had non-zero exit status
> install.packages("proj4")
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
trying URL 'https://cloud.r-project.org/src/contrib/proj4_1.0-14.tar.gz'
Content type 'application/x-gzip' length 43038 bytes (42 KB)
==================================================
downloaded 42 KB

* installing *source* package ‘proj4’ ...
** package ‘proj4’ successfully unpacked and MD5 sums checked
** using staged installation
checking whether pkg-config knows about proj... no
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether the compiler supports GNU C++... yes
checking whether g++ -std=gnu++17 accepts -g... yes
checking for g++ -std=gnu++17 option to enable C++11 features... none needed
checking for stdio.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for strings.h... yes
checking for sys/stat.h... yes
checking for sys/types.h... yes
checking for unistd.h... yes
checking for proj_api.h... no
checking with ACCEPT_USE_OF_DEPRECATED_PROJ_API_H... 
checking for proj_api.h... no
checking for proj.h... no
checking for proj_get_source_crs in -lproj... no
configure: Retrying with pkg-config --static
Package proj was not found in the pkg-config search path.
Perhaps you should add the directory containing `proj.pc'
to the PKG_CONFIG_PATH environment variable
No package 'proj' found
checking for proj_get_source_crs in -lproj... no
configure: PROJ4 API available: no
configure: PROJ6 API available: no
checking whether to require PROJ6 API... no
configure: error: Cannot find working proj.h headers and library.
*** You may need to install libproj-dev or similar! ***

ERROR: configuration failed for package ‘proj4’
* removing ‘/usr/local/lib/R/site-library/proj4’

The downloaded source packages are in
        ‘/tmp/Rtmppr30GK/downloaded_packages’
Warning message:
In install.packages("proj4") :
  installation of package ‘proj4’ had non-zero exit status
> install.packages("libproj-dev")
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Warning message:
package ‘libproj-dev’ is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

but I am not an admin on those cluster nodes.

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 11 (bullseye)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Zurich
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] processx_3.8.1    compiler_4.3.1    R6_2.5.1          rprojroot_2.0.3  
 [5] cli_3.6.1         prettyunits_1.1.1 tools_4.3.1       withr_2.5.0      
 [9] curl_5.0.0        crayon_1.5.2      remotes_2.4.2.1   desc_1.4.2       
[13] callr_3.7.3       pkgbuild_1.4.0    ps_1.7.5

The list of einprot dependencies is rather long, see Is all of this really needed?

csoneson commented 7 months ago

Looks like you're missing a system dependency (https://packages.debian.org/sid/libproj-dev). I'm certainly aware that the list of dependencies is long, and I'm trying to keep it to a minimum, but in order to cover the entire workflow (which is really the purpose of the package) I don't think I can reduce it by much unfortunately...

csoneson commented 4 months ago

Hi, sorry for the long silence here. I just merged a PR (https://github.com/fmicompbio/einprot/pull/19) that brings the DIA-NN support (as well as support for Spectronaut) into the main branch (einprot v0.9.4). At this point, I still consider it experimental, since we still need to gather more experiences with these outputs, but it's there for people to test. I also fixed the issues brought up in this thread:

The DIA-NN workflow works fine also with the gene matrix (no changes needed)
I added a small example of how to run the DIA-NN workflow with the main (long-form) report
The DIA-NN example data was updated to use the same scientific notation as the raw DIA-NN data
It is not possible to provide relative paths for the input files (they will be normalized to absolute paths in the analysis function)

I will close this issue since I think the original issue has been addressed - feel free to open a new issue if there are additional questions. Thanks for your input!