(inactive) protlocassign

mooredf22 commented 2 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Repository: https://github.com/mooredf22/protlocassign

Confirm the following by editing each check box to '[x]'

[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.

I am familiar with the essential aspects of Bioconductor software management, including:

[x] The 'devel' branch for new packages and features.
[x] The stable 'release' branch, made available every six months, for bug fixes.
[x] Bioconductor version control using Git (optionally via GitHub).

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

Comments: The "protlocassign" package presents R functions that take use mass spectrometry data from serial fractionation experiments to proportionally assign proteins to possibly multiple residences in subcellular organelles. The package includes seven vignettes that describe the methods in detail. There is also an accompanying manuscript describing the methods; a revision is under review.

Here are my comments on BiocCheck notes:

1 "Consider adding these automatically suggested biocViews: RNASeq"
This is not relevant to this proteomics-related package

2 "Recommended function length <= 50 lines."
Many of the functions that are longer than 50 lines create complex plots. There is no natural way to break them up into smaller components.

3 "Consider adding runnable examples to the following man pages which document exported objects:"
Most of the functions that are not service functions include runnable examples. The small number that are not are explained and illustrated in detail in the accompanying vignettes; setting up runnable examples for these functions withing the help files would be cumbersome.

4 "Consider shorter lines; 235 lines (3%) are > 80 characters long."
All lines in the functions are shorter than 80 characters. Much of the explanatory text in the vignettes are longer than 80 characters, and we believe that that text is readable and clear in its current form.

5 "Consider multiples of 4 spaces for line indents, 1133 lines(15%) are not."
We have chosen to use descriptive variable names in our code in order to clearly link them to the to the concepts explained in the vignettes and in the accompanying manuscript. As a consequence, some of these variable names are rather long, which makes it difficult to adhere strictly to the "multiples of 4 spaces" suggestion. We have chosen indentation amounts to be as readable as possible and also conform to the limit of 80 characters per line.

6 "Cannot determine whether maintainer is subscribed to the bioc-devel mailing list (requires admin credentials).\nSubscribe here: https://stat.ethz.ch/mailman/listinfo/bioc-devel" I am listed as the maintainer and I have subscribed to this mailing list.

bioc-issue-bot commented 2 years ago

Hi @mooredf22

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: protlocassign
Title: Compute constrained proportional assignments for 
    fractionated protein abundance 
Version: 0.99.0
Date: 2022-02-26
Authors@R: 
    person(given = "Peter",
 family = "Lobel", 
 role = c("aut"),
 email = "lobel@cabm.rutgers.edu")
    person(given = "Dirk",
 family = "Moore",
 role = c("cre","aut"),
 email = "dirkfmoore@gmail.com")
Description: This package fits a constrained proportional assignment (cpa)
   model to mass spectrometry data 
    to proportionally assign proteins to subcellular locations.
    The package includes transformations to 
    improve the accuracy of the assignments as well as
    facilities for simulating mixtures and evaluating 
    the perfomance of cpa on those mixtures.
    Also included are facilities for summarizing spectral-level 
    data using random effects models to calculate average protein profiles.
License: GPL-3
Encoding: UTF-8
LazyData: false
Depends: 
    R (>= 4.1.0),
    lme4
RoxygenNote: 7.1.2
Imports:
    BB,
    pracma,
    outliers,
    knitr,
    stats,
    graphics,
    viridisLite,
    viridis,
    plot.matrix,
    grid,
    gridExtra,
    rmarkdown,
    utils,
    methods,
    BiocParallel
biocViews: Proteomics, MassSpectrometry
VignetteBuilder: knitr
URL: http//github.com/mooredf22/protlocassign
Suggests: 
    testthat (>= 3.0.0)
Config/testthat/edition: 3

vjcitn commented 2 years ago

Hi -- this very comprehensively documented package will have to change in certain respects to be part of Bioconductor. I find that the built package is 13MB in size. I'll look more closely at that in a minute. The first vignette has

Start by installing the devtools package from CRAN, by typing:

install.packages("devtools")
Then install the protlocassign package from the github repository by typing:

devtools::install_github("mooredf22/protlocassign")
This will make the programs and data sets available. Also, several libraries are required which may be downloaded from CRAN by typing:

install.packages(c("BB", "pracma", "lme4", "outliers"))

and this is not appropriate. BiocManager::install will be used. See the developer guidelines.

We have some large html files in the built package:

-rw-r--r--  0 vincentcarey staff 5828512 Feb 26 20:53 protlocassign/inst/doc/vignette_4_more_on_mixtures.html
-rw-r--r--  0 vincentcarey staff   15016 Feb 26 20:55 protlocassign/inst/doc/vignette_5_Using_classical_fractionation_with_five_compartments_on_CPA.R
-rw-r--r--  0 vincentcarey staff   20066 Feb 26 20:48 protlocassign/inst/doc/vignette_5_Using_classical_fractionation_with_five_compartments_on_CPA.Rmd
-rw-r--r--  0 vincentcarey staff 6915101 Feb 26 20:55 protlocassign/inst/doc/vignette_5_Using_classical_fractionation_wi

It isn't completely clear to me why this is -- perhaps your graphics format is not an efficient one.

Finally, I think the computations for the vignettes are going to take too long for the build system. Have you timed the build/ check process? You probably want to distinguish between software documentation in the vignette and an analysis workflow that uses your package, and create a workflow package that can have a longer build time.

mooredf22 commented 2 years ago

I have uploaded a revised version of protlocassign. Here are the changes:

We have updated all discussions in the vignettes of installing packages to using the Bioconductor version of install.
We have created small subsets of our datasets for use in the vignettes. This has reduced the build time a great deal. On my Windows PC the build time (including the vignettes) was 333 seconds, and the check time (including vignettes) was 716 seconds.
We have deleted all plots from the appendices of Vignettes 3 and 4, reducing their size. Below is a screenshot of the sizes of the vignettes:

All but one of them are below 1 megabyte, and vignette 5 is only slightly above 1 megabyte.

If these are still too large, we can consider deleting some of them and link to (perhaps) a new github site containing the full set of vignettes. We believe, however, that this would be an inconvenience for users of the package.

vjcitn commented 2 years ago

Package builds to 8MB which is too big. But it can be reviewed.

bioc-issue-bot commented 2 years ago

A reviewer has been assigned to your package. Learn what to expect during the review process.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, TIMEOUT, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/protlocassign to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

lshep commented 2 years ago

@mooredf22 Please correct ERROR and TIMEOUT before a more indepth review will continue.

bioc-issue-bot commented 2 years ago

This issue is being closed because there has been no progress for an extended period of time. You may reopen the issue when you have the time to actively participate in the review / submission process. Please also keep in mind that a package accepted to Bioconductor requires a commitment on your part to ongoing maintenance.

Thank you for your interest in Bioconductor.

lshep commented 2 years ago

There has been no response and no effort to correct the current build ERROR's. If you plan on proceeding with the submission please request that this issue be reopened but only when you are ready to actively address package issues.

mooredf22 commented 2 years ago

I request that this submission, https://github.com/Bioconductor/Contributions/issues/2541, be re-opened. We have re-worked this package extensively to make it suitable for Bioconductor. In particular, while previously we had seven vignettes, we have now combined them into a single file, and refer to them as tutorials. We have also eliminated many figures. In this way, we have reduced the package size considerably. On my Windows machine, the package binary is 2.34 MB and the vignette (tutorials) is 2.54 MB. We have considerably re-worked the help files and added unit tests. This also has reduced the build time.

The vignette (tutorials) corresponds to those we are using as supplemental material for a manuscript currently under review. We hope that the current size is satisfactory so that it will correspond closely with the version in our manuscript.

mooredf22 commented 2 years ago

I would like to reopen the protlocassign submission. It had taken us some time to re-work this package to make it suitable for Bioconductor. We have endeavored to answer all of the issues with this submission, particularly reducing its size and build time.

This is my first submission to Bioconductor, and I am still learning how to follow the policy on how to reopen this case. I have uploaded the revised package to github and increased the version number. I have also entered a new comment on the (currently inactive) submission site.

Thank you for your consideration

Sent from my iPhone

On Apr 5, 2022, at 8:05 AM, lshep @.***> wrote:

There has been no response and no effort to correct the current build ERROR's. If you plan on proceeding with the submission please request that this issue be reopened but only when you are ready to actively address package issues.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

bioc-issue-bot commented 2 years ago

Dear @mooredf22 ,

We have reopened the issue to continue the review process. Please remember to push a version bump to git.bioconductor.org to trigger a new build.

lgatto commented 2 years ago

@mooredf22 - just checking weather you have pushed your latest changes to the Bioconductor git server, as the last commit I see is

Author: Moore <mooredf@umdnj.edu>
Date:   Wed Mar 2 12:27:20 2022 -0500

    Use small test data sets for vignettes to reduce build times.

and you mentioned recent changes in your previous message.

mooredf22 commented 2 years ago

Before the submission was closed, the version was 0.99.1 Now it is 0.99.2

If I need to increment it again, let me know. Thanks!

Sent from my iPhone

On Apr 14, 2022, at 8:20 AM, Laurent Gatto @.***> wrote:

@mooredf22 - just checking weather you have pushed your latest changes to the Bioconductor git server, as the last commit I see is

Author: Moore @.***> Date: Wed Mar 2 12:27:20 2022 -0500
Use small test data sets for vignettes to reduce build times.
and you mentioned recent changes in your previous message.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

lshep commented 2 years ago

You need to push to git.bioconductor.org. Please activate your git credentials account if you did not do so already and you can manage your ssh keys for access there. Please then see the new package workflow for how to set up remotes and push to git.bioconductor.org

lgatto commented 2 years ago

@mooredf22 do you need any help to push your latest changes to the Bioconductor git server?

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 8e7534f8f8816ff9fd3a8b917600b988749b12c1

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, WARNINGS, TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/protlocassign to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

mooredf22 commented 2 years ago

I am revising the vignette code to decrease the build time and size.

lgatto commented 2 years ago

Thank you - I have started the review anyway but didn't mange to finish it yet.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 1503e07dc14898a79fb4f20ce001c96606266d3d

mooredf22 commented 2 years ago

I have added two new data frames to the package that contain pre-computed values for figures with panels of tables in Tutorials 4 and 5. I also modified the "mixtureHeatMap" function to optionally use these pre-computed values in place of computing them during the knitting of the vignette (tutorials) during the package build. I also used "\dontrun{}" in the help file for the "mixtureHeatMap" function. These two changes have significantly reduced the build time for the package without impacting the contents of the vignette (i.e. tutorials).

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/protlocassign to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

mooredf22 commented 2 years ago

The main concern with the build report appears to be the size of the vignette (which is a compendium of seven tutorials). These tutorials are being published as supplementary material for a companion manuscript that has recently been accepted for publication in the Journal of Proteome Research. I understand that there are resource limitations, but I (and my co-authors) would like to find a way to make this vignette acceptable for Bioconductor while deviating as little as possible from the published version.

lshep commented 2 years ago

Would it make sense than to keep a smaller, proof of principle vignette inside the package with the R code and publish an accompanied workflow package(s) that are more encompassing workflow packages with the full tutorials

mooredf22 commented 2 years ago

Yes, thanks, we will do this. We will submit both the main package (protlocassign) and the "workflow" (documentation) package (protlocassigndoc) at the same time when they are ready.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 1cf6af3d7dde0844c546fca444b33804b406df2c

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/protlocassign to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

mooredf22 commented 2 years ago

The companion paper to the protlocassign package has just appeared in the Journal of Proteome Research:

A Method to Estimate the Distribution of Proteins across Multiple Compartments Using Data from Quantitative Proteomics Subcellular Fractionation Experiments Dirk F. Moore, David E. Sleat, and Peter Lobel

https://pubs.acs.org/doi/abs/10.1021/acs.jproteome.1c00781

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 3ad17c2b4cf1a1609e4475941c7ac3979a9be63c

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/protlocassign to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: ade20fdfa40557d225f80ba2bffd9005568bb5b9

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/protlocassign to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

mooredf22 commented 2 years ago

To process the associated package "protlocassigndoc" I added the following line to the DESCRIPTION file:

AdditionalPackage: https://github.com/mooredf22/protlocassigndoc

Since it built without errors, have I correctly followed the linked packages protocol?

bioc-issue-bot commented 2 years ago

Dear @mooredf22 ,

You (or someone) has already posted that repository to our tracker.

See https://github.com/Bioconductor/Contributions/issues/2656

You cannot post the same repository more than once.

If you would like this repository to be linked to issue number: 2541, Please contact a Bioconductor Core Member.

lgatto commented 2 years ago

Hi @mooredf22 - apologies for the delay in getting back to you.

General package development

File names: please choose one of .R or .r as R file extension and stick to it. The former is the one most widely used in Bioconductor.
In the README file and vignette, please use BiocManager::install(), also to install from Github. The package dependencies will be installed automatically, so add explicitly installation instructions for these.

The DESCRIPTION file

BiocCheck suggest to update the R dependency to >= 4.2.
The Date field isn't needed. The risk is that you forget to update it and it suddenly doesn't match the latest version. And if needed, the date of the most recent version can be extracted from the latest commit.
It is encouraged to use your Github issues for bug reports (see BugReports field in the DESCRIPTION file)

The NAMESPACE file

Some of the exported symbols start with a capital letter. Typically, in Bioconductor, upper case symbols are used for classes (and their constructors).

The CITATION file

If there's one or multiple specific publications you would like you users to cite when using your packages, you can add it as a inst/CITATION file. Otherwise, citation("protlocassign") will automatically create a citation for the package.

Documentation

Please provide references to the methods used (if appropriate) as well as to other similar or related projects and packages. There are several packages in Bioconductor that deal with protein sub-cellular localisation: hpar, HPAanalyze provide access to IF data; pRoloc, SubCellBarCode, pRolocdata and pRolocGUI provide methods, data and visualisation infrastructure of MS-based spatial proteomics.
Could you please use the BiocStyle package to format your vignette. This allows to have a coherent Bioconductor style for all packages' vignettes.
Could you please package man page present so that users can get a quick overview of the package and suggestions for further reading when running ?protlocassign.
In your vignette, you have a section where you use setwd() to change the working directory to a temporary directory. You show two chunks using Windows and Unix syntax, neither being executed. I would advise to not change working directory and use R's tempdir() function to define a default temporary directory and then write your csv files there. This would even allow you to execute the code chunks.

Unit tests

Your unit tests cover only 11.6% or you code base.

> covr::package_coverage("protlocassign/")
protlocassign Coverage: 11.64%
R/fCPAsubsets.r: 0.00%
R/markerProfilePlot.r: 0.00%
R/mixtureAreaError.r: 0.00%
R/mixtureHeatMap.r: 0.00%
R/mixturePlotPanel.r: 0.00%
R/mixturePlot.r: 0.00%
R/OutlierFind.R: 0.00%
R/profileSummarize.r: 0.00%
R/protDistanceCompute.R: 0.00%
R/proteinDataPrep.R: 0.00%
R/proteinMix.r: 0.00%
R/protPepProfilePlot.R: 0.00%
R/protPepProfile.r: 0.00%
R/protProfilePlot.R: 0.00%
R/RSAtransform.r: 52.08%
R/cpaProgs.R: 52.12%

I would also suggest to annotate your unit test with comments and sort them in different files (once you have more) and in a set of coherent test/functionalities using test_that().

R code

Please spell out TRUE and FALSE rather then using T and F. I have spotted this undesirable shortcut in the vignette and in commented R code; it isn't as serious as in executed R files, but better avoided altogether.
In some of you examples (see for example Qfun4()) in the roxygen documentation, you intersperse the example code (prefixed with #') with plain comments (starting with #), that do not appear in the final manual page. Is this intended? Shouldn't the comment line start with the roxygen prefix?
There are a lot of commented left-overs in your code such as unformatted/unindented code, calls to browser() and even close curly brackets. I would suggest to clean these up and either delete these comments if they aren't needed, or format/indent them properly to make them readable and useful.
You still have calls to system.time in your functions that aren't needed.
In profileSummarize() (there might be other cases), you have a cpu argument that controls if a the code is to be parallelised using bplapply() where you hard-code to BPPARAM argument to SnowParam(cpus). You don't need to handle such cases manually and shouldn't hard-code the parallelisation parameters. Always use bplapply() with the default BPPARAM = bpparam(). Users can this parametrise the parallelisation backend or disable it by setting BPPARAM = BiocParallel::SerialParam().
Several of your functions are excessively long. I would suggest to check if these can be split into smaller sub-functions, which in turn would be more appropriate for smaller and focused unit tests.
It is recommended to use lines < 80 characters, both for R code, man pages and Rmd files. You might want to consider applying this to your vignette too if you want. One advantage is that changes, once committed to Github, will be more precisely located to shorted lines.
Some of indentation doesn't match multiples of 4 spaces. This is generally handled by your editor, so simply check if it can auto-indent the lines properly. This is relatively minor.
Bioconductor has a long experience and tradition of defining and re-using core classes across different packages. For quantitative data as yours, these would typically be either SummarizedExperiments or the MSnSet (which is also used in pRoloc). These enable to coherently store quantitative data as well as sample and feature metadata - in your case, protein markers and the localisation assignments. What is your rational for not making use of these core classes? By using such classes, you get built-in validity checks, processing and visualisation for free.

Third-party code

I see that in projSimplex(), you refer to some Matlab code. Is this code under an appropriate open licence allowing this, and is it compatible with you GPL-3 licence?

Misc

You have a docx file in your inst directory that described the primary and processed data deposited in MassIVE. I suggest to create a manual page (for example MSV000083842.Rd or protlocassignData.Rd) to make it accessible to your users. Using a text file instead of a binary file makes it also possible to track changes in Github.

Feel free to get in touch to discuss any of the points above if needed.

mooredf22 commented 2 years ago

I am progressing well in addressing these many issues and expect to post a revision soon.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 9df8643e65cf49c384920782090e5ba0022ed569

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/protlocassign to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 2 years ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 117bb81fa8f724a2f3ce38d3d56418b89da10a9c

bioc-issue-bot commented 2 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details. This link will be active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/protlocassign to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

mooredf22 commented 2 years ago

Following are my responses to the reviewer's detailed comments:

response BioconductorReviewComments 24 Jul 2022.docx

lgatto commented 2 years ago

Thanks - could you please add your responses using the markdown syntax directly in the issue, rather than in an external doc file.

mooredf22 commented 1 year ago

Response to comments on “protlocassign”

Thank you for these extensive comments. Be aware that our “protlocassign” package has evolved over many years and was not originally developed with Bioconductor in mind. That said, I have endeavored to address the Bioconductor requirements without fundamentally changing the user experience.

General package development

File names: please choose one of .R or .r as R file extension and
stick to it. The former is the one most widely used in Bioconductor.

All functions are now “.R”.
In the README file and vignette, please use
BiocManager::install(), also to install from Github. The package
dependencies will be installed automatically, so add explicitly
installation instructions for these.

I have updated the README file as described.

The DESCRIPTION file

BiocCheck suggest to update the R dependency to >= 4.2.

Done.
The Date field isn't needed. The risk is that you forget to update
it and it suddenly doesn't match the latest version. And if needed,
the date of the most recent version can be extracted from the latest
commit.

I have removed the Date field.
It is encouraged to use your Github issues for bug reports (see
BugReports field in the DESCRIPTION file)

I have added a Github BugReports link in the DESCRIPTION file

The NAMESPACE file

Some of the exported symbols start with a capital letter. Typically,
in Bioconductor, upper case symbols are used for classes (and their
constructors).

This one is hard to fix since we chose some names, e.g. “Acup”, for readability to closely match notation in the main paper. Furthermore, the package is already publicly available and changing the name would be disruptive to current users.

The CITATION file

If there's one or multiple specific publications you would like you
users to cite when using your packages, you can add it as a
inst/CITATION file. Otherwise, citation("protlocassign") will
automatically create a citation for the package.

I have added a CITATION file, and ‘citation(“protlocassign”)’ returns two references as text. Optionally, ‘print(citation("protlocassign"), bibtex=TRUE)’ returns the references in BiBtex format.

Documentation

Please provide references to the methods used (if appropriate) as
well as to other similar or related projects and packages. There are
several packages in Bioconductor that deal with protein sub-cellular
localisation: hpar, HPAanalyze provide access to IF data;
pRoloc, pRolocdata and pRolocGUI provide methods, data and
visualisation infrastructure of MS-based spatial proteomics.

Other Bioconductor packages are available for working with proteomics data.

I added this at the end of the first section:

‘The "HPAanalyze" package (Tran et al., 2019) provides tools

for accessing the Human Protein Atlas and for visualizing the data.

The "pRoloc" package (Breckels et al.) provides tools for studying

localisation of proteins inside cells and for manipulating

and visualizing relevant data

from quantitative fractionation experiments. In this vignette and

in the tutorials in the accompanying "protlocassigndoc" package we

provide detailed instructions on how to format data in ".csv" files

for import in this package. Data stored in

"pRoloc" or "hpa" R formats may be converted into the formats required

by "protlocassign" in a similar fashion.’

And this at the end of the last section:

‘Details of how data are pre-processed for input into the protlocassign

package may be found in the `protlocassignDataDescription.md` file

in the `\inst` subdirectory of the R package. We note that many of

these pre-processing steps could, as an alternative, be carried out

within R using the `MSnSet` facilities also used by the `pRoloc`

package. An advantage of this approach is that these packages provide

a natural way to store metadata such as protein markers and their

localization as well including built-in validity checks and visualization

tools. See the references by L Gatto, LM Breckels, KS Lilley and colleagues

for details.’

I also added appropriate references.

Could you please use the BiocStyle package to format your
vignette. This allows to have a coherent Bioconductor style for all
packages' vignettes.

Yes, I have used this style format now.
Could you please package man page present so that users can get a
quick overview of the package and suggestions for further reading
when running ?protlocassign.

Yes, ‘?protlocassign’ now returns a description of the package.
In your vignette, you have a section where you use setwd() to
change the working directory to a temporary directory. You show two
chunks using Windows and Unix syntax, neither being executed. I
would advise to not change working directory and use R's tempdir()
function to define a default temporary directory and then write your
csv files there. This would even allow you to execute the code
chunks.

This is a great suggestion. I have used ‘tempdir()’ except in a case where extensive plots are output. In that case I use unexecuted code.

Unit tests

Your unit tests cover only 11.6% or you code base.

> covr::package_coverage("protlocassign/")

protlocassign Coverage: 11.64%

R/fCPAsubsets.r: 0.00%

R/markerProfilePlot.r: 0.00%

R/mixtureAreaError.r: 0.00%

R/mixtureHeatMap.r: 0.00%

R/mixturePlotPanel.r: 0.00%

R/mixturePlot.r: 0.00%

R/OutlierFind.R: 0.00%

R/profileSummarize.r: 0.00%

R/protDistanceCompute.R: 0.00%

R/proteinDataPrep.R: 0.00%

R/proteinMix.r: 0.00%

R/protPepProfilePlot.R: 0.00%

R/protPepProfile.r: 0.00%

R/protProfilePlot.R: 0.00%

R/RSAtransform.r: 52.08%

R/cpaProgs.R: 52.12%

I would also suggest to annotate your unit test with comments and sort
them in different files (once you have more) and in a set of coherent
test/functionalities using test_that().

I have greatly expanded the number of unit tests to cover the core package functions, including all transformations, cpa-related routines, and simulations.

R code

Please spell out TRUE and FALSE rather then using T and F. I
have spotted this undesirable shortcut in the vignette and in
commented R code; it isn't as serious as in executed R files, but
better avoided altogether.

I have searched my code for these and I believe I have corrected all of them.
In some of you examples (see for example Qfun4()) in the roxygen
documentation, you intersperse the example code (prefixed with #')
with plain comments (starting with #), that do not appear in the
final manual page. Is this intended? Shouldn't the comment line
start with the roxygen prefix?

Yes, I have fixed these.
There are a lot of commented left-overs in your code such as
unformatted/unindented code, calls to browser() and even close
curly brackets. I would suggest to clean these up and either delete
these comments if they aren't needed, or format/indent them properly
to make them readable and useful.

I have cleaned these up.
You still have calls to system.time in your functions that aren't
needed.

I have removed these.
In profileSummarize() (there might be other cases), you have a
cpu argument that controls if a the code is to be parallelised
using bplapply() where you hard-code to BPPARAM argument to
SnowParam(cpus). You don't need to handle such cases manually and
shouldn't hard-code the parallelisation parameters. Always use
bplapply() with the default BPPARAM = bpparam(). Users can this
parametrise the parallelisation backend or disable it by setting
BPPARAM = BiocParallel::SerialParam().

I have changed the parallelization code as discussed in ‘profileSummarize’ and ‘outlierFind’, the two places where it is used.
Several of your functions are excessively long. I would suggest to
check if these can be split into smaller sub-functions, which in turn
would be more appropriate for smaller and focused unit tests.

The long functions are graphical routines, which are lengthy because the plots are complex. In principle I could shorten them a bit by putting the code in loops in a separate function. However, I have been maintaining the code in its current form for some time and I don’t think that doing this would be particularly helpful. As for unit tests, it is difficult to carry these out with functions that produce plots and no other output.
It is recommended to use lines \< 80 characters, both for R code, man
pages and Rmd files. You might want to consider applying this to
your vignette too if you want. One advantage is that changes, once
committed to Github, will be more precisely located to shorted lines.

All code, including in the man pages and vignettes, are \< 80 characters, with rare exceptions (e.g. in a title) where cutting a line would cause output problems.
Some of indentation doesn't match multiples of 4 spaces. This is
generally handled by your editor, so simply check if it can
auto-indent the lines properly. This is relatively minor.

My indentation procedures are partly a consequence of our choice of long variable names, which we did to make a clear connection to the language in the main paper. I have endeavored to make the indentation as readable as possible.
Bioconductor has a long experience and tradition of defining and
re-using core classes across different packages. For quantitative
data as yours, these would typically be either
SummarizedExperiments or the MSnSet (which is also used in
pRoloc). These enable to coherently store quantitative data as
well as sample and feature metadata - in your case, protein markers
and the localisation assignments. What is your rational for not
making use of these core classes? By using such classes, you get
built-in validity checks, processing and visualisation for free.

Please see me comments on other packages and data formats in the “Documentation” section above. We developed our package over a long period of time without Bioconductor as a design aim. Had we thought of Bioconductor from the beginning we may have incorporated these data structures into the package. Despite this, users who have data in, say “SummarizedExperients” or “MsnSet” R data structures (rather than as external Excel files) should be able to follow our instructions to extract the data and protein/peptide information in the proper format for the “protlocassign” package. If users request facilities for automating this process, I will try to add them in a future release.

Third-party code

I see that in projSimplex(), you refer to some Matlab code. Is
this code under an appropriate open licence allowing this, and is it
compatible with you GPL-3 licence?

I contacted the authors of the original “projSimplex” code and they agreed to distribute it under the “MIT” open source license. I have included the appropriate text in “projSimplex” as required by this license. I also changed the license for “protlocassign” to the “Artistic-2.0” license to be compatible with the MIT license and to be consistent with the common use of “Artistic-2.0” in the Bioconductor repository.

Misc

You have a docx file in your inst directory that described the
primary and processed data deposited in MassIVE. I suggest to create
a manual page (for example MSV000083842.Rd or
protlocassignData.Rd) to make it accessible to your users. Using a
text file instead of a binary file makes it also possible to track
changes in Github.

I have converted the file to “.Rd” format.

Feel free to get in touch to discuss any of the points above if needed.

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.

mooredf22 commented 1 year ago

The detailed comments on this package have led to considerable quality improvements. That said, it has been some time since my above August 18 response, and I am wondering if further changes are required to make the package suitable for inclusion in Bioconductor.

lshep commented 1 year ago

I will check with @lgatto about the re-review and reassign if necessary. We apologize for the delay.

lgatto commented 1 year ago

I'll do it by the end of the week.

lgatto commented 1 year ago

Apologies for the delay. I'll make sure this doesn't slip off my radar anymore.

Are you sure you have pushed all the changes you are mentioning? Your first reply mentions "All functions are now .R", but there's still a mix or .R and .r here and on your GitHub repository.
Regarding the installation instructions, you can use 'BiocManager::install()' for GitHub packages too. It might be worth simplifying your user's experience and suggest to use it for 'BiocManager::install("mooredf22/protlocassign")' too.
You are mistaken in referring your users to look into the package's inst directory. Upon installation, the content of inst is moved to the root of the package directory. As you can see below, it doesn't exist anymore, and this is what your users will have, not the package source directory.

> dir(system.file(package = "protlocassign"))
 [1] "CITATION"                          "data"
 [3] "DESCRIPTION"                       "doc"
 [5] "help"                              "html"
 [7] "INDEX"                             "Meta"
 [9] "NAMESPACE"                         "NEWS.md"
[11] "protlocassignDataDescription.html" "protlocassignDataDescription.md"
[13] "R"

They need to look into doc to find your documentation files. I also suggest you show how to use dir and system.file, ideally with full.names = TRUE as shown below, so they know immediately where to find the files.

dir(system.file("doc", package = "protlocassign"), full.names = TRUE)

I'm curious as to why you don't make protlocassignDataDescription a vignette. It's OK to make it an Rmd file even if it doesn't have any code chunks, so that you keep all documentation files/vignettes together.
Apologies if I wasn't clear, but the expected practice in terms of parallelisation is to have BPPARAM = bpparam() as an argument in your function signature. Your implementation gets it wrong. If I execute the following code from your manual, I use 6 parallel processes, even if I specified 2:

> set.seed(17356)
> eps <- 0.029885209
> data(spectraNSA_test)
> library(BiocParallel)
> flagSpectraBox <- outlierFind(protClass=spectraNSA_test,
                                   outlierLevel='peptide', numRefCols=5,
                                   numDataCols=9,
                                   outlierMeth='boxplot', range=3, eps=eps,
                                   randomError=TRUE, cpus=2)

This is because my default backend uses 6 workers.

> bpparam()
class: MulticoreParam
  bpisup: FALSE; bpnworkers: 6; bptasks: 0; bpjobname: BPJOB
  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
  bpRNGseed: ; bptimeout: NA; bpprogressbar: FALSE
  bpexportglobals: TRUE; bpexportvariables: FALSE; bpforceGC: TRUE; bpfallback: TRUE
  bplogdir: NA
  bpresultdir: NA
  cluster type: FORK

You expect the user to create and register the parallelisation backend, and then define whether to use it by setting a value of cpus > 1. If you replace cpus = 1 with the recommended BPPARAM = bpparam(), it will all work all by itself, either by passing the parallel param object directly to the function argument, or by registering it globally.

I'll leave it to you to refactor your functions to make them shorter, but keep in mind that all good developers agree on the fact that shorter functions are better.
The reuse of Bioconductor infrastructure and the interoperability with Bioconductor objects comes up regularly during review. I have a couple of comments and suggestions about the use of Bioconductor infrastructure, quoting some fellow reviewers/developers:

"Packages that just work with data.frame can be distributed via CRAN" - I feel that despite not using the dedicated classes, protlocassign has a place in Bioconductor, as it addresses the same use cases as for example pRoloc does, and presenting several such packages as part of Bioconductor makes sense... writing this, I realise that this is also a strong argument for both to work on the same data structures... but see below.

"The user shouldn't have to manually do that kind of [...] conversion", referring to conversion to/from data.frames when dedicated structures exist. It would however be considered acceptable if (1) the data.frame structure is described in the manual page and (2) you provide functions that convert your data.frame to/from one of the expected class.

I would thus like to suggest to add "facilities for automating this process", as you mention in your reply.

mooredf22 commented 1 year ago

AdditionalPackage: https://github.com/mooredf22/protlocassigndoc

bioc-issue-bot commented 1 year ago

Hi @mooredf22, Thanks for submitting your additional package: https://github.com/mooredf22/protlocassigndoc. We are taking a quick look at it and you will hear back from us soon.

Bioconductor / Contributions