Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

mspms #3460

Closed baynec2 closed 1 week ago

baynec2 commented 5 months ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 5 months ago

Hi @baynec2

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: mspms
Type: Package
Title: Analysis tools for use with MSP-MS
Version: 0.99.0
Authors@R: person("Charlie", "Bayne", email = "baynec2@gmail.com",
    role = c("aut", "cre"))
Description: This package provides functions for the analysis of data generated 
    by the multiplex substrate profiling by mass spectrometry for proteases 
    (MSP-MS) method.Data exported from upstream proteomics software is accepted 
    as input and subsequently normalized, outlier removed, and imputed. 
    Tools for statistical analysis, visualization, and interpretation of 
    the data are provided.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.1
Depends: 
    R (>= 4.1.0)
biocViews: Proteomics, MassSpectrometry
Imports: 
    dplyr (>= 1.1.4),
    magrittr (>= 2.0.3),
    MASS (>= 7.3.60.0.1),
    NormalyzerDE (>= 1.19.7),
    outliers (>= 0.15),
    purrr (>= 1.0.2),
    stats (>= 4.1.1),
    tidyr (>= 1.3.1),
    truncnorm (>= 1.0.9),
    utils (>= 4.1.1),
    ggplot2 (>= 3.5.0),
    ggseqlogo (>= 0.2),
    heatmaply (>= 1.5.0),
    readr (>= 1.1.5),
    readxl (>= 1.4.3),
    rstatix (>= 0.7.2),
    tibble (>= 3.2.1),
    SummarizedExperiment (>= 1.24.0),
    rlang (>= 1.1.3),
    forcats (>= 1.0.0),
    ggpubr (>= 0.6.0),
    rmarkdown (>= 2.26),
    DT (>= 0.33)
Suggests: 
    knitr,
    testthat (>= 3.0.0)
Config/testthat/edition: 3
URL: https://github.com/baynec2/mspms
BugReports: https://github.com/baynec2/mspms/issues
VignetteBuilder: knitr
lshep commented 5 months ago

Could you please provide an abstract/intro section in your vignette that provides motivation for inclusion in Bioconductor and when appropriate a review and comparison to existing Bioconductor packages with similar functionality or scope.

Please do not hard code paths in vignettes/examples/tests. For example

"../tests/testdata/protein-peptides-lfq.csv"

Data should be in a inst/extdata directory and data should be well documented on source, licensing, and how it was generated either with a file in inst/script that can be text, pseudo code or code but detail how a user could procure or created the files in inst/extadata. Then retrieve the path with system.file().

baynec2 commented 5 months ago

Just edited as you suggested and pushed changes!

bioc-issue-bot commented 5 months ago

Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 5 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 22.04.3 LTS): mspms_0.99.0.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/mspms to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

baynec2 commented 4 months ago

Hi, I am not quite sure what to do about the build error I am getting on macOS 12.7.1 Monterey/x86_64 that seems to be originating from the readxl package (Symbol not found: (_iconv). Do you have any advice?

bioc-issue-bot commented 4 months ago

A reviewer has been assigned to your package for an indepth review. Please respond accordingly to any further comments from the reviewer.

jwokaty commented 4 months ago

I was able to build the package on the intel mac without any modification. I will kick off the build to see if the report will change as well.

bioc-issue-bot commented 4 months ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: macOS 12.7.1 Monterey: mspms_0.99.0.tar.gz Linux (Ubuntu 22.04.3 LTS): mspms_0.99.0.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/mspms to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

baynec2 commented 3 months ago

Hey, I wanted to check in on the status of this. Is there anything else I can do at the moment?

DarioS commented 3 months ago

Evidently, Laurent Gatto is not on holidays, so it shouldn't be too much longer until you receive a peer review from him.

image

lgatto commented 3 months ago

Sorry, for the delay, just wanted to let you know that the review is in process, and you'll hear from me by the end of the week or weekend.

baynec2 commented 3 months ago

Great- thank you for the note!

lgatto commented 3 months ago

Hi @baynec2

Thank you for your patience. I will share my review in two posts. This first part is really central, and will influence if/how to address the smaller code-related aspects. My concerns address what the nature of Bioconductor packages, as opposed to other bioinformatics/omics analysis packages.

Re-use of classes and functionality

  1. You make use of tibbles to store the quantitative data and the sample information. Bioconductor has a long tradition of using centrally defined and maintained S4 data classes, that are dedicated for omics data. In particular, we have the SummarizedExperiment of quantitative data from proteomics (and transcriptomics, metabolomics, ...). You even use it, internally, when you run the normalisation step!

    We do emphasise the systematic re-use of these classes, to promote interoperability: packages developed by different developers/groups and operating on these same objects become inter-operable, and users can use any of them in a consistent way. As an example, there are many normalisation, imputation, aggregation, ... methods that are readily available. And if there was one you needed, new ones could be added for SummarizedExperiment objects for general usage in the community.

    This is an important request we have for new packages, and is paramount for Bioconductor packages. We generally feel that packages that prefer not to adapt these classes are probably a better fit for CRAN.

    I am of course available for further discussions to help you work with SummarizedExperiment (or even QFeatures objects, if you want to consider data at the precursor, peptide and protein level).

  2. I see that you implemented your own statistical approaches (t-test and anova). Statistical analyses is one of the strengths of the Bioconductor project. There are several packages that provide linear modelling that are way more flexible than t-tests, including empirical Bayesian approaches (see limma) and mixed modelling (see msqrob2), that are suited or specifically developed for proteomics data. I think it would be misleading to direct users away from the well established Bioconductor packages.

  3. The cleaver Bioconductor package offers the functionality that you implemented in your various cleavage functions. cleaver works with widely used Biostrings classes to handle (potentially very large) DNA, RNA and AA sequences.

lgatto commented 3 months ago

The DESCRIPTION file

The NEWS file

Syntax should be

# mspms 0.99

## mspms 0.99.0

* Initial Bioconductor submission.

Documentation

Unit tests

R code

baynec2 commented 2 months ago

@lgatto Thank you for the thoughtful review, I really appreciate your comments and suggestions.   This is my first attempt at making a Bioconductor R package, so I am sure I made some less-than-ideal design choices in the process.  I am however very interested in learning and improving for the future.   I have a big-picture question for you:   Is it worthwhile endeavor to try to convert this to be a Bioconductor package? I am not entirely sure how hard this would be, but based on your comments I suspect this will require a significant restructuring of the code base.   I have some other comments and questions regarding your review below.

baynec2 commented 2 months ago

Hi @baynec2

Thank you for your patience. I will share my review in two posts. This first part is really central, and will influence if/how to address the smaller code-related aspects. My concerns address what the nature of Bioconductor packages, as opposed to other bioinformatics/omics analysis packages.

Re-use of classes and functionality

  1. You make use of tibbles to store the quantitative data and the sample information. Bioconductor has a long tradition of using centrally defined and maintained S4 data classes, that are dedicated for omics data. In particular, we have the SummarizedExperiment of quantitative data from proteomics (and transcriptomics, metabolomics, ...). You even use it, internally, when you run the normalisation step! We do emphasise the systematic re-use of these classes, to promote interoperability: packages developed by different developers/groups and operating on these same objects become inter-operable, and users can use any of them in a consistent way. As an example, there are many normalisation, imputation, aggregation, ... methods that are readily available. And if there was one you needed, new ones could be added for SummarizedExperiment objects for general usage in the community. This is an important request we have for new packages, and is paramount for Bioconductor packages. We generally feel that packages that prefer not to adapt these classes are probably a better fit for CRAN. I am of course available for further discussions to help you work with SummarizedExperiment (or even QFeatures objects, if you want to consider data at the precursor, peptide and protein level).

This makes sense to me.   I am interested in modifying to be compatible with the broader Bioconductor ecosystem. I am still very much learning and don’t have any experience developing these things (hence why I opted to store things as tibbles in my package)   Do you have any suggestions or how I could go about implementing such changes in this package?

  1. I see that you implemented your own statistical approaches (t-test and anova). Statistical analyses is one of the strengths of the Bioconductor project. There are several packages that provide linear modelling that are way more flexible than t-tests, including empirical Bayesian approaches (see limma) and mixed modelling (see msqrob2), that are suited or specifically developed for proteomics data. I think it would be misleading to direct users away from the well established Bioconductor packages.

This package is intended to be applied specifically to analyze MSP-MS assay data.   I agree that those packages have more robust/ flexible statistical approaches, but the t-tests and anova results have produced biologically relevant conclusions with MSP-MS  data, and are suggested by the authors of the MSP-MS assay. See here: https://pubmed.ncbi.nlm.nih.gov/36948708/#&gid=article-figures&pid=fig-2-uid-1   This is why I made the decision to point the user to these implementations of T-tests and anova.   Perhaps it would be better to use limma and msqrob2, but I think a significant problem with doing so is that most of the users of this assay/package are likely unfamiliar with the statistics that are performed by limma or msqrob2 (myself included if I am being honest, unfortunately I am not a statistician).   I suppose the point is if the package was integrated into the Bioconductor framework, then the user could easily opt to use limma or msqrob2. I agree that this would be highly beneficial.

  1. The cleaver Bioconductor package offers the functionality that you implemented in your various cleavage functions. cleaver works with widely used Biostrings classes to handle (potentially very large) DNA, RNA and AA sequences.

I would happily swap out my functions for established ones if they provided the functionality I needed, but from my understanding, the cleaver package does not. Perhaps I am missing something, but that was my interpretation after reading the cleaver package docs. https://www.bioconductor.org/packages/release/bioc/manuals/cleaver/man/cleaver.pdf   It seems to me that the cleaver package was designed to cleave peptide sequences according to a designated enzyme specificity.   In my package, I have a different use case. I need to determine where the detected peptide was cleaved from (and the corresponding motif around the cleavage site) with respect to the original peptide

baynec2 commented 2 months ago

The DESCRIPTION file

  • Why dependency on R 4.1.0? The R version dependency should be updated from 4.1.0 to 4.4.0.

This is the version I had installed when building the package. Obviously not ideal, I will update this.

  • You define specific version dependencies. There aren't needed. And if they are, they should match current development versions.

Why is it bad to specify >= versions of dependencies? I specified the version of the dependencies that I used when building the package. My thought process was that I cannot be sure that previous versions of the dependency will be compatible. Likewise, if the dependencies get updated in the future, they should still be compatible - but could conceivably not work as I don’t know how the authors will change their code. By leaving this the way it is, potential problems induced by prior packages are eliminated, and if there was a problem with a future version of a dependency, at least there is a record of the dependency version it did work for.

  • Consider adding your ORCID in 'Authors@R' with comment=c(ORCID="...")

Will do

'LazyData:' in the 'DESCRIPTION' should be set to false or removed

For my own edification could you explain why? I tried this a while ago after reading the Bioconductor docs and when I did so, I was not able to access the package data, so I changed it back.

The NEWS file

Syntax should be

# mspms 0.99

## mspms 0.99.0

* Initial Bioconductor submission.

I can add this

Documentation

  • The vignette should have Introduction and Installation sections.
  • References to the methods used as well as to other similar or related projects and packages is also expected. This includes infrastructure for data manipulation (imputation, normalisation, ...) and statistical analyses. But see my points above.

I can add this

  • The vignette uses BiocStyle package for formatting and have a table of contents.

I can add this

Unit tests

  • You package coverage is 39.80%, with many functions having no coverage at all. I would recommend trying to increase this.

I can also work on this. I find writing good, informative tests to be challenging. Do you have any recommendations for how to go about this, or resources describing it? For example, some of my functions create plots. What would a good test be for such a function?

R code

  • Is there a reason you use magrittr::%>% instead of |> - maybe using the operator in R will prove safer in the long run.

I have been using magrittr %>% since before the base pipe was implemented. I never felt the need to transition. I think %>% is a pretty safe bet, but I could change.

  • What is the pipe-utils.R file. Do you need to define the pipe.Rd man page? Added by usethis::use_pipe(). See https://usethis.rlib.org/reference/use_pipe.html
  • There are undefined global functions or variables: condition in plot_cleavages_per_pos and plot_volcano. These are likely false positives, but it might be worth checking.

Will look into this

  • Avoid using '=' for assignment and use '<-' instead. This has been spotted in the following files:

I can change this. For my own understanding- is there any benefit to using <- instead of =? I like to use = instead because I find it easier to type, equally as easy to read, and more similar to other languages. I tried to change this using styler, but I guess it got missed in some files.

  • R/calc_AA_fc.R (line 22, column 15)

  • R/calc_AA_fc.R (line 54, column 22)

  • R/calc_AA_fc.R (line 62, column 22)

  • R/count_cleavages_per_pos.R (line 24, column 9)

  • R/count_cleavages_per_pos.R (line 26, column 7)

  • R/log2fct_time.R (line 27, column 12)

  • R/log2fct_time.R (line 28, column 19)

  • R/plot_cleavages_per_pos.R (line 22, column 24)

  • R/plot_cleavages_per_pos.R (line 24, column 6)

  • R/plot_cleavages_per_pos.R (line 32, column 8)

  • R/plot_cleavages_per_pos.R (line 35, column 8)

  • R/plot_cleavages_per_pos.R (line 39, column 8)

  • R/plot_heatmap.R (line 29, column 12)

  • R/plot_heatmap.R (line 33, column 12)

  • R/plot_icelogo.R (line 24, column 5)

  • R/plot_icelogo.R (line 27, column 15)

  • R/plot_volcano.R (line 44, column 8)

  • R/plot_volcano.R (line 47, column 8)

    • The recommended function length is 50 lines or less. There are 10 functions greater than 50 lines. Having shorter functions simplifies the code and facilitates debugging, maintenance and unit testing. The longest 5 functions are:
  • prepare_peaks() (R/prepare_peaks.R): 95 lines

  • cterm_cleavage() (R/cterm_cleavage.R): 76 lines

  • nterm_cleavage() (R/nterm_cleavage.R): 65 lines

  • impute() (R/impute.R): 64 lines

  • prepare_pd() (R/prepare_pd.R): 58 lines

I could try to work on this. My code tends to be longer in length because I prefer to use piping. Generally, I prefer prioritizing readability over less lines. For example, cterm_cleavage() (and others) could easily be reduced to 48 lines if I deleted some spacing and comments (76 – 28) However, I agree some stuff may benefit from being broken down into smaller logical units.

bioc-issue-bot commented 1 month ago

Received a valid push on git.bioconductor.org; starting a build for commit id: ab65ff23a349f26691d69efc66886a1dacf9872e

baynec2 commented 1 month ago

@lgatto I never heard back from you, but I went ahead and addressed the points that you brought up in your review the best I could without additional information. Please see below for an account of the changes I incorporated in response to your feedback.   Could you please take a look when you get the chance and let me know what the next steps are?

baynec2 commented 1 month ago

Hi @baynec2

Thank you for your patience. I will share my review in two posts. This first part is really central, and will influence if/how to address the smaller code-related aspects. My concerns address what the nature of Bioconductor packages, as opposed to other bioinformatics/omics analysis packages.

Re-use of classes and functionality

  1. You make use of tibbles to store the quantitative data and the sample information. Bioconductor has a long tradition of using centrally defined and maintained S4 data classes, that are dedicated for omics data. In particular, we have the SummarizedExperiment of quantitative data from proteomics (and transcriptomics, metabolomics, ...). You even use it, internally, when you run the normalisation step! We do emphasise the systematic re-use of these classes, to promote interoperability: packages developed by different developers/groups and operating on these same objects become inter-operable, and users can use any of them in a consistent way. As an example, there are many normalisation, imputation, aggregation, ... methods that are readily available. And if there was one you needed, new ones could be added for SummarizedExperiment objects for general usage in the community. This is an important request we have for new packages, and is paramount for Bioconductor packages. We generally feel that packages that prefer not to adapt these classes are probably a better fit for CRAN. I am of course available for further discussions to help you work with SummarizedExperiment (or even QFeatures objects, if you want to consider data at the precursor, peptide and protein level).

I changed the structure of the package to load data exported from upstream proteomics software as a QFeatures object. MSP-MS specific processing of peptide information is added as rowData. Sample metadata is added as colData. Subsequent data processing is then performed with MsCoreUtils standardized methods (log2 transformation, imputation, and median centered normalization).

  1. I see that you implemented your own statistical approaches (t-test and anova). Statistical analyses is one of the strengths of the Bioconductor project. There are several packages that provide linear modelling that are way more flexible than t-tests, including empirical Bayesian approaches (see limma) and mixed modelling (see msqrob2), that are suited or specifically developed for proteomics data. I think it would be misleading to direct users away from the well established Bioconductor packages.

I eliminated the ANOVA function but left the t-test implementation because it is requested by the current users of the assay. I believe that it benefits from being highly interpretable to a wide audience relative to other methods.    Now that data is stored in a QFeatures object, subsequent analysis can easily be performed by limma or msqrob2 if desired by the user.

  1. The cleaver Bioconductor package offers the functionality that you implemented in your various cleavage functions. cleaver works with widely used Biostrings classes to handle (potentially very large) DNA, RNA and AA sequences.

After looking at the cleaver docs, I thought that the functionality provided by my functions was unique, so I left it as is.

bioc-issue-bot commented 1 month ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: ERROR before build products produced.

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/mspms to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

baynec2 commented 1 month ago

The DESCRIPTION file

  • Why dependency on R 4.1.0? The R version dependency should be updated from 4.1.0 to 4.4.0.

I updated to 4.4.0

  • You define specific version dependencies. There aren't needed. And if they are, they should match current development versions.

I removed the specific version dependencies as suggested.

This is still confusing to me and seems to contradict with some other resources I have read (https://r-pkgs.org/description.html#sec-description-imports-suggests-minium-version)

  • Consider adding your ORCID in 'Authors@R' with comment=c(ORCID="...")

Added my ORCID

  • 'LazyData:' in the 'DESCRIPTION' should be set to false or removed

Would you be able to explain why? I have not been able to find an explanation for this.

I would prefer to leave it as is if possible (https://r-pkgs.org/data.html), as the package relies on data stored in this manner (a default MSP-MS peptide library is used unless otherwise specified)

The NEWS file

Syntax should be

# mspms 0.99

## mspms 0.99.0

* Initial Bioconductor submission.

Changed this.

Documentation

  • The vignette should have Introduction and Installation sections.
  • References to the methods used as well as to other similar or related projects and packages is also expected. This includes infrastructure for data manipulation (imputation, normalisation, ...) and statistical analyses. But see my points above.

Added this information.

  • The vignette uses BiocStyle package for formatting and have a table of contents.

Added this.

Unit tests

  • You package coverage is 39.80%, with many functions having no coverage at all. I would recommend trying to increase this.

I increased the testing coverage significantly.

R code

  • Is there a reason you use magrittr::%>% instead of |> - maybe using the operator in R will prove safer in the long run.

I decided to leave this as is since I had written some of the code using dot notation that is only compatible with the magrittr::%>% pipe.

  • What is the pipe-utils.R file. Do you need to define the pipe.Rd man page?

Added by usethis::use_pipe(). See https://usethis.rlib.org/reference/use_pipe.html

  • There are undefined global functions or variables: condition in plot_cleavages_per_pos and plot_volcano. These are likely false positives, but it might be worth checking.

Fixed this.

  • Avoid using '=' for assignment and use '<-' instead. This has been spotted in the following files:

    • R/calc_AA_fc.R (line 22, column 15)
    • R/calc_AA_fc.R (line 54, column 22)
    • R/calc_AA_fc.R (line 62, column 22)
    • R/count_cleavages_per_pos.R (line 24, column 9)
    • R/count_cleavages_per_pos.R (line 26, column 7)
    • R/log2fct_time.R (line 27, column 12)
    • R/log2fct_time.R (line 28, column 19)
    • R/plot_cleavages_per_pos.R (line 22, column 24)
    • R/plot_cleavages_per_pos.R (line 24, column 6)
    • R/plot_cleavages_per_pos.R (line 32, column 8)
    • R/plot_cleavages_per_pos.R (line 35, column 8)
    • R/plot_cleavages_per_pos.R (line 39, column 8)
    • R/plot_heatmap.R (line 29, column 12)
    • R/plot_heatmap.R (line 33, column 12)
    • R/plot_icelogo.R (line 24, column 5)
    • R/plot_icelogo.R (line 27, column 15)
    • R/plot_volcano.R (line 44, column 8)
    • R/plot_volcano.R (line 47, column 8)

Reran through styler. This appears to have fixed the styling to be consistent with <- notation.

  • The recommended function length is 50 lines or less. There are 10 functions greater than 50 lines. Having shorter functions simplifies the code and facilitates debugging, maintenance and unit testing. The longest 5 functions are:

    • prepare_peaks() (R/prepare_peaks.R): 95 lines
    • cterm_cleavage() (R/cterm_cleavage.R): 76 lines
    • nterm_cleavage() (R/nterm_cleavage.R): 65 lines
    • impute() (R/impute.R): 64 lines
    • prepare_pd() (R/prepare_pd.R): 58 lines

All functions are now less than 50 lines.

bioc-issue-bot commented 1 month ago

Received a valid push on git.bioconductor.org; starting a build for commit id: f238e0a4327df0d101742ae0b2ad335e674cd75a

bioc-issue-bot commented 1 month ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 24.04.1 LTS): mspms_0.99.2.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/mspms to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

lgatto commented 1 month ago

Hi @baynec2 - but regarding the following

This is my first attempt at making a Bioconductor R package, so I am sure I made some less-than-ideal design choices in the process. I am however very interested in learning and improving for the future.

No problem at all, every developer starts with a first package.

Is it worthwhile endeavor to try to convert this to be a Bioconductor package? I am not entirely sure how hard this would be, but based on your comments I suspect this will require a significant restructuring of the code base.

This is indeed a good question. I wouldn't say it's hard, but of course, it can look that way in the beginning, and there will be some restructuring involved. The main question is whether you have an interest in re-using existing Bioconductor infrastructure to allow for interoperability between different packages, or if having a package that works for you, the way it is, is good enough. I considered early on, many years ago, that the former was the best option for me, but the second option is also valid, of course.

I'll look at your updates next week. Sorry for the delay.

baynec2 commented 1 month ago

hey @lgatto - wanted to check in to see if you had any time to look at the updates yet.

baynec2 commented 1 month ago

Hi @lgatto. Just wanted to send a quick follow-up message to see if there is any update here.

lshep commented 1 month ago

@lgatto can look into the code revisions but it is also important to note that the package as receiving a timeout when run on the Bioconductor builders. @baynec2 was there an attempt to reduce the time the package takes to run. For reference this was the last build report

bioc-issue-bot commented 1 month ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 03e4b6555e0f6fa2ac871a9871d6cf7a0302090e

bioc-issue-bot commented 1 month ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "TIMEOUT". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 24.04.1 LTS): mspms_0.99.3.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/mspms to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 month ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 18d88b19c19a272a4688be2e16e26b0d27095e30

baynec2 commented 1 month ago

@lgatto can look into the code revisions but it is also important to note that the package as receiving a timeout when run on the Bioconductor builders. @baynec2 was there an attempt to reduce the time the package takes to run. For reference this was the last build report

@lshep - thanks for the note. I just pushed changes last night reducing the amount of data stored in the package- I think this should fix the timeout issue. How can I find the build report? I'm not seeing an update from the bioc-issue-bot

lshep commented 1 month ago

looks like the builds were interrupted. I'll kick off a manual build now and it should appear shortly.

bioc-issue-bot commented 1 month ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 24.04.1 LTS): mspms_0.99.4.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/mspms to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

baynec2 commented 1 month ago

@lshep - do you know if the below warning is a problem?

Files over the limit:
 /tmp/RtmpA3bKGa/file3391aee75ce0/mspms//inst/doc/mspms_vignette.html
 /tmp/RtmpA3bKGa/file3391aee75ce0/mspms/inst/doc/mspms_vignette.html
 /tmp/RtmpA3bKGa/file3391aee75ce0/mspms/inst/doc/mspms_vignette.html

I am puzzled by the warning because inst/doc/ doesn't seem to exist.

bioc-issue-bot commented 3 weeks ago

Received a valid push on git.bioconductor.org; starting a build for commit id: a9941e5859a0a4ab942ae0708ee9e8bf5bd51771

bioc-issue-bot commented 3 weeks ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 24.04.1 LTS): mspms_0.99.5.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/mspms to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

baynec2 commented 3 weeks ago

hey @lshep - I took a look at the build report above- despite the message it doesn't look like there are any warnings. Is this accurate?

baynec2 commented 3 weeks ago

@lgatto - I haven't heard from you in a month, wanted to check and see if there is anything I can do to move this along.

lshep commented 3 weeks ago

hey @lshep - I took a look at the build report above- despite the message it doesn't look like there are any warnings. Is this accurate?

There seems to be a bug in the warning reporting. we will investigate on our end but the warning is safe to ignore right now.

baynec2 commented 1 week ago

Hey @lshep -Is there anything I can help with here to move things forward? I haven't heard from @lgatto in 5 weeks, so just checking in to see if there’s anything I can assist with on my end.

lgatto commented 1 week ago

Thank you for the updates!

I'm accepting your package to avoid more delays, but please implement the small changes in the first and third comments. I'll leave it to you whether to highlight/link to package landing pages.

Thank you for your contribution and apologies for the delay.

bioc-issue-bot commented 1 week ago

Your package has been accepted. It will be added to the Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.

lshep commented 1 week ago

The default branch of your GitHub repository has been added to Bioconductor's git repository as branch devel.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/baynec2.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("mspms"). The package 'landing page' will be created at

https://bioconductor.org/packages/mspms

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

baynec2 commented 1 week ago

@lgatto Thank you for all of the feedback! I went ahead and made those final changes. Responses below.

Thank you for the updates!

  • I think you can remove imputeLCMD from the Imports in the DESCRIPTION file, as it isn't imported in the namespace. Given that it is loaded indirectly in the vignette, you could add it to Suggests, to document its use.

removed

  • Tiny suggestion: you could highlight the name of your package in the vignette as code mspms or in italic mspms so that it stands out. I find it easier to read. You could even use BiocStyle::Biocpkg("mspms") in an in-line code chunk to automatically generate a link to your package's Bioconductor landing page. You could use the same for other packages that you name in your vignette, to provide users with direct links to further details.

Added these links for the Bioconductor packages!

  • In your vignette, you refer to helper functions in _helper_functions.R files. You could make it a link to the R directory in your GitHub repository to tell users where to find these functions. But why don't export them directly? If you export them, users also get manual pages and the full R experience, rather than having to use non-exported functions (which isn't recommended) or copy/paste from your GitHub repository.

Added the suggested links! I decided to not expose these functions directly to the user in an effort to keep the package API as clean and intuitive as possible.

I'm accepting your package to avoid more delays, but please implement the small changes in the first and third comments. I'll leave it to you whether to highlight/link to package landing pages.

Thank you for your contribution and apologies for the delay.

Thanks again!

baynec2 commented 3 days ago

@lshep - when do the landing pages get built? Is it when the packages get bumped to 1.0.0?

lgatto commented 3 days ago

Lori will be able to provide more details but you won't need to wait until version 1.0, as development versions of packages also have landing pages. Packages are build a couple of times a week (it used to be every 24 hours, not sure if it has changes), and landing pages are updated accordingly if there was a version bump and there are no issues with build/check/install.

lshep commented 3 days ago

The package will be bumped to 1.0.0 at the next Bioconductor release. Until then it stays 0.99.x in the devel branch of Bioconductor. Both release and devel versions have package landing pages. As Laurent pointed out, they get updated daily after the daily builds finish. The daily build schedule can be found at https://bioconductor.org/checkResults/ . That being said, I do see the landing page was not generated; there appears to be an issue on our system. I'm having our build system maintainer investigate this morning and hope to have it remedied shortly. when available you can use a package short url that will direct to devel if release is not available https://bioconductor.org/packages/mspms

baynec2 commented 2 days ago

great thank you!