fgcz / prolfqua

Differential Expression Analysis tool box R lang package for omics data
https://pubs.acs.org/doi/pdf/10.1021/acs.jproteome.2c00441
MIT License
37 stars 7 forks source link

Error at normalisation step #55

Closed clawless-inoviv closed 1 year ago

clawless-inoviv commented 1 year ago

Hi,

I'm running through the vignette (here: https://fgcz.github.io/prolfqua/articles/Comparing2Groups.html)

I get to here:

lfqdata$to_wide()$data[1:3,1:7]

and I get the error:

Error in `tidyr::spread()`:
! Each row of output must be identified by a unique combination of keys.
ℹ Keys are shared for 6767 rows

This is also the same error I get when try running the normalisation (robscale)

lt <- lfqdata$get_Transformer()
transformed <- lt$log2()$robscale()$lfq

I then tried to normalise via that described here: https://fgcz.github.io/prolfqua/reference/LFQDataTransformer.html#examples

lfqTrans <-lfqdata$get_Transformer()

x <- lfqTrans$intensity_array(log2)

x$lfq$config$table$is_response_transformed

x <- x$intensity_matrix(robust_scale)

plotter <- x$lfq$get_Plotter()
plotter$intensity_distribution_density()

However, I get the response:

Warning message:
In x$intensity_matrix(robust_scale) :
  data already transformed. If you still want to log2 tranform, set force = TRUE

And checking the two steps have "run", the robustscale request has made no changes to the data, there are no changes to the log2 transformation column or an additional robust scale column in the tibble.

The warning response above suggests that I should run with force=T, but states that this is to allow another log2 transformation, which is not required as the intensities have already been log transformed.

Running with force=T, does add another column with robust_scale appended to the column name. But has this been log transformed again?

Any help greatly appreciated.

clawless-inoviv commented 1 year ago

Here is my R environment if that helps:

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8    LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidyr_1.3.0    tibble_3.2.1   ggplot2_3.4.1  prolfqua_1.1.1 reshape2_1.4.4

loaded via a namespace (and not attached):
 [1] ggrepel_0.9.3      Rcpp_1.0.10        conflicted_1.2.0   digest_0.6.31      utf8_1.2.3         R6_2.5.1           cellranger_1.1.0   plyr_1.8.8        
 [9] backports_1.4.1    httr_1.4.5         pillar_1.8.1       rlang_1.1.0        lazyeval_0.2.2     readxl_1.4.2       rstudioapi_0.14    data.table_1.14.8 
[17] car_3.1-1          labeling_0.4.2     readr_2.1.4        stringr_1.5.0      htmlwidgets_1.6.2  bit_4.0.5          pheatmap_1.0.12    munsell_0.5.0     
[25] broom_1.0.4        compiler_4.2.2     pkgconfig_2.0.3    htmltools_0.5.4    tidyselect_1.2.0   gridExtra_2.3      fansi_1.0.4        viridisLite_0.4.1 
[33] crayon_1.5.2       dplyr_1.1.0        tzdb_0.3.0         withr_2.5.0        ggpubr_0.6.0       MASS_7.3-58.1      grid_4.2.2         jsonlite_1.8.4    
[41] gtable_0.3.2       lifecycle_1.0.3    magrittr_2.0.3     scales_1.2.1       vroom_1.6.1        cli_3.6.0          stringi_1.7.12     cachem_1.0.7      
[49] carData_3.0-5      farver_2.1.1       ggsignif_0.6.4     remotes_2.4.2      ellipsis_0.3.2     generics_0.1.3     vctrs_0.6.0        RColorBrewer_1.1-3
[57] tools_4.2.2        forcats_1.0.0      bit64_4.0.5        glue_1.6.2         purrr_1.0.1        hms_1.1.2          parallel_4.2.2     abind_1.4-5       
[65] fastmap_1.1.1      yaml_2.3.7         colorspace_2.1-0   rstatix_0.7.2      memoise_2.0.1      plotly_4.10.1
jjGG commented 1 year ago

Hello @clawless-inoviv,

Getting an error here:

lfqdata$to_wide()

points to the fact that in the annotation file the mapping for the sampleNames is NOT unique or there is in any case an issue with the SampleNames

1) Are you running the complete code from the vignette (https://fgcz.github.io/prolfqua/articles/Comparing2Groups.html) and you get this error?

2) Do you try to run this vignette with your own data?

3) Try: can you check how many unique fileName (or raw.filenames) you have in lfqdataand compare it to the SampleNames?

Best regards jonas

clawless-inoviv commented 1 year ago

Hi @jjGG ,

I have just realised (on checking again) that I didn't copy across (facepalm moment):

atable$hierarchy[["protein_Id"]] <- c("proteinID")

The vignette runs fine now. With some testing, I have now managed to figure out the issue with the normalisation (which roots back to the tidy_to_wide function). I am using data from DiaNN and I need to set the hierarchy for peptides as Precursor.Id and not Stripped.Sequence.

Thank you for your patience.

jjGG commented 1 year ago

Hei @clawless-inoviv,

Very nice! You solved it. Have fun with prolfqua ;)

Best regards jonas

wolski commented 1 year ago

Thanks @jjGG