UCLouvain-CBIO / scp

Single cell proteomics data processing
https://uclouvain-cbio.github.io/scp/index.html
19 stars 2 forks source link

computeSCR not working #25

Closed edemmott closed 3 years ago

edemmott commented 3 years ago

computeSCR not working on dataset using the development version (BiocManager::install("UCLouvain-CBIO/scp") ). Seems to deviate from how it is described in the vignette. When called as:

scp <- computeSCR(scp, i = 1:96, # Note: it will error on the Carrier Only samples as nothing matches the string for samples so these omitted colDataCol = "SampleType", samplePattern = "HEK|U937", carrierPattern = "Carrier", rowDataName = "MeanSCR" )

Error message is: Error in computeSCR(scp, i = 1:96, colDataCol = "SampleType", samplePattern = "HEK|U937", : unused argument (rowDataName = "MeanSCR")

Seems to go through the experiments, but doesn't save the data anywhere?

edemmott commented 3 years ago

If I remove the rowDataName argument, it does run (puts out warnings because this dataset has a two-carrier design), but doesn't appear to save.

scp <- computeSCR(scp, i = 1:96, # Note: it will error on the Carrier Only samples as nothing matches the string for samples colDataCol = "SampleType", samplePattern = "HEK|U937", carrierPattern = "Carrier" )

Output: There were 50 or more warnings (use warnings() to see the first 50)

warnings() Warning messages: 1: In computeSCR(scp, i = 1:96, colDataCol = "SampleType", ... : Multiple carriers found in assay 'Ed20210701_SCoPE2_HEKU_1'. Only the first match will be used 2: In computeSCR(scp, i = 1:96, colDataCol = "SampleType", ... : Multiple carriers found in assay 'Ed20210701_SCoPE2_HEKU_10'. Only the first match will be used 3: In computeSCR(scp, i = 1:96, colDataCol = "SampleType", ... : Multiple carriers found in assay 'Ed20210701_SCoPE2_HEKU_11'. Only the first match will be used 4: In computeSCR(scp, i = 1:96, colDataCol = "SampleType", ... : etc...

edemmott commented 3 years ago

session_info() ─ Session info ───────────────────────────────────────────────────────────────────────────────── setting value
version R version 4.0.4 (2021-02-15) os macOS Catalina 10.15.6
system x86_64, darwin17.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/London
date 2021-07-18

─ Packages ───────────────────────────────────────────────────────────────────────────────────── ! package version date lib source
AnnotationFilter 1.14.0 2020-10-27 [1] Bioconductor
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
Biobase
2.50.0 2020-10-27 [1] Bioconductor
BiocGenerics 0.36.1 2021-04-16 [1] Bioconductor
V BiocManager 1.30.10 2021-06-15 [1] CRAN (R 4.0.2)
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.0.2)
cachem 1.0.5 2021-05-15 [1] CRAN (R 4.0.2)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.0.2)
cli 3.0.0 2021-06-30 [1] CRAN (R 4.0.2)
colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.0.2)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2)
curl 4.3.2 2021-06-23 [1] CRAN (R 4.0.2)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.2)
DelayedArray 0.16.3 2021-03-24 [1] Bioconductor
desc 1.3.0 2021-03-05 [1] CRAN (R 4.0.2)
devtools
2.4.2 2021-06-07 [1] CRAN (R 4.0.2)
digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
V dplyr 1.0.5 2021-06-18 [1] CRAN (R 4.0.2)
V ellipsis 0.3.1 2021-04-29 [1] CRAN (R 4.0.2)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1)
V fansi 0.4.2 2021-05-25 [1] CRAN (R 4.0.2)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.2)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.2)
GenomeInfoDb 1.26.7 2021-04-08 [1] Bioconductor
GenomeInfoDbData 1.2.4 2021-03-13 [1] Bioconductor
GenomicRanges
1.42.0 2020-10-27 [1] Bioconductor
ggplot2 3.3.5 2021-06-25 [1] CRAN (R 4.0.2)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2)
htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
IRanges
2.24.1 2020-12-12 [1] Bioconductor
V knitr 1.31 2021-04-24 [1] CRAN (R 4.0.2)
lattice 0.20-44 2021-05-02 [1] CRAN (R 4.0.2)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.0.2)
lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2)
magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
MASS 7.3-54 2021-05-03 [1] CRAN (R 4.0.2)
Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.0.2)
MatrixGenerics
1.2.1 2021-01-30 [1] Bioconductor
matrixStats 0.59.0 2021-06-01 [1] CRAN (R 4.0.2)
memoise 2.0.0 2021-01-26 [1] CRAN (R 4.0.2)
MsCoreUtils 1.2.0 2020-10-27 [1] Bioconductor
MultiAssayExperiment
1.16.0 2020-10-27 [1] Bioconductor
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2)
V pillar 1.5.1 2021-05-16 [1] CRAN (R 4.0.2)
pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
pkgload 1.2.1 2021-04-06 [1] CRAN (R 4.0.2)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
processx 3.5.2 2021-04-30 [1] CRAN (R 4.0.2)
ProtGenerics 1.22.0 2020-10-27 [1] Bioconductor
ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.2)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2)
V QFeatures 1.0.0 2021-07-17 [1] Github (rformassspectrometry/QFeatures@be2640e) R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
RCurl 1.98-1.3 2021-03-16 [1] CRAN (R 4.0.2)
remotes 2.4.0 2021-06-02 [1] CRAN (R 4.0.2)
V rlang 0.4.10 2021-04-30 [1] CRAN (R 4.0.2)
V rmarkdown 2.7 2021-06-15 [1] CRAN (R 4.0.2)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.2)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.2)
S4Vectors
0.28.1 2020-12-09 [1] Bioconductor
scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2)
V scp 1.0.0 2021-07-17 [1] Github (UCLouvain-CBIO/scp@3127425)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
SingleCellExperiment 1.12.0 2020-10-27 [1] Bioconductor
SummarizedExperiment
1.20.0 2020-10-27 [1] Bioconductor
testthat 3.0.4 2021-07-01 [1] CRAN (R 4.0.2)
V tibble 3.1.0 2021-05-16 [1] CRAN (R 4.0.2)
V tidyselect 1.1.0 2021-04-30 [1] CRAN (R 4.0.2)
V tinytex 0.30 2021-05-29 [1] CRAN (R 4.0.2)
usethis * 2.0.1 2021-02-10 [1] CRAN (R 4.0.2)
V utf8 1.1.4 2021-03-12 [1] CRAN (R 4.0.2)
V vctrs 0.3.6 2021-04-29 [1] CRAN (R 4.0.2)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.2)
V xfun 0.22 2021-06-15 [1] CRAN (R 4.0.2)
XVector 0.30.0 2020-10-28 [1] Bioconductor
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
zlibbioc 1.36.0 2020-10-28 [1] Bioconductor

[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

V ── Loaded and on-disk version mismatch.

edemmott commented 3 years ago

Ok: seems a mismatch in the docs: rowNameData parameter has been removed. The function automatically saves this to the single-cell experiment assays, but as '.meanSCR', rather than MeanSCR. Probably just needs a change to the documentation.

cvanderaa commented 3 years ago

Hello @edemmott !

Thanks a lot for the report!

1. First issue

Error in computeSCR(scp, i = 1:96, colDataCol = "SampleType", samplePattern = "HEK|U937", :
unused argument (rowDataName = "MeanSCR")

is clearly linked to the usage of an older version of scp. In fact, from the session_info I can see you are using scp v1.0.0 (and also QFeatures v1.0.0) while we are currently a few versions ahead. This leads to the error you mentioned here but also for your other issue (https://github.com/UCLouvain-CBIO/scp/issues/26) where pep2qvalue did not exist yet in v1.0.0.

This is however very surprising to me that you get those older versions installed while using BiocManager::install("UCLouvain-CBIO/scp") as we recommend. My wild guess would be that you may use an older version of Bioconductor that automatically installs the version of Github that is compatible with your current install. Could run BiocManager::version() and make sure it returns "3.14". If not, you can install it using BiocManager::install(version = "devel").

2. Second issue

Indeed, it is strange that the output is not stored in the rowData... It should normally have created a new column in the rowData of assays 1 to 96 called something like .MeanSCR. I however suspect it is also linked to the package version issue.

I think you raise an interesting point that the function currently throws a warning when multiple carriers are present. I would be interested to have your thoughts about how we could best combine the information available from different carriers. I would say it is better to take the mean between carriers rather than simply taking the first of the two carriers.

cvanderaa commented 3 years ago

Note: running BiocManager::install(version = "devel") might update a lot of packages as I can see that most of your Bioc packages are a few versions behind. Let me know whether you can manage to update the Bioconductor version. Once I'll better understand what is happening, I will make this more explicit in the README.

edemmott commented 3 years ago
  1. BiocManager::version() gave 3.12. Updated R and Bioconductor and reinstalled the new QFeatures and SCP packages. Seems to have done the trick and also resolved issues with pep2qvalue.

  2. Re multiple carriers - I think theres a few ways. I have these both down as 'Carrier' in the annotation file, but they could be specified as carrier1 and carrier 2. If doing as I have you could maybe have an explicit option: sensible options would be:

    • Take first (current approach) or second in the assay
    • Take the carrier with the highest intensity
    • Take the mean of the carriers.
cvanderaa commented 3 years ago

Thanks for the input !!

The README has been updated with additional info on how to get the devel of scp installed.

I also improved the computeSCR to allow users to provide a function that will combine the carriers, giving them the freedom the choose whatever function they consider more appropriate (mean, max, median, or any custom function).

Feel free to close the issue if you think I tackled the problem or feel free to ask for more if you think there is still something missing.