Bioconductor / Rsamtools

Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import
https://bioconductor.org/packages/Rsamtools
Other
27 stars 27 forks source link

Add rworkflows #48

Closed bschilder closed 11 months ago

bschilder commented 1 year ago

I just added rworkflows to Rsamtools to have some immediate checks on your code (across 3 platforms) via GitHub Actions

There's still a couple of remaining Warnings and Notes from the previous version that i wasn't sure how to resolve. But since these were occurring in Rsamtools before I made my changes, I think it's safe to merge the PR and then sort out the source of these issues.

@vobencha @hpages @mtmorgan


❯ checking compiled code ... WARNING
  File ‘Rsamtools/libs/Rsamtools.so’:
    Found ‘___stderrp’, possibly from ‘stderr’ (C)
      Objects: ‘bam_sort.o’, ‘bamfile.o’, ‘bedidx.o’, ‘sam_opts.o’,
        ‘sam_utils.o’, ‘samtools_patch.o’
    Found ‘___stdoutp’, possibly from ‘stdout’ (C)
      Objects: ‘bam_sort.o’, ‘sam_utils.o’
    Found ‘_abort’, possibly from ‘abort’ (C)
      Object: ‘bam_sort.o’
  File ‘Rsamtools/libs/Rsamtools.so’:
    Found no call to: ‘R_useDynamicSymbols’

  Compiled code should not call entry points which might terminate R nor
  write to stdout/stderr instead of to the console, nor use Fortran I/O
  nor system RNGs.
  It is good practice to register native routines and to disable symbol
  search.

  See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual.

❯ checking installed package size ... NOTE
    installed size is  8.2Mb
    sub-directories of 1Mb or more:
      R         3.0Mb
      extdata   2.7Mb
      libs      1.3Mb

❯ checking dependencies in R code ... NOTE
  Unexported objects imported by ':::' calls:
    ‘S4Vectors:::explodeIntBits’ ‘S4Vectors:::implodeIntBits’
    ‘S4Vectors:::makePowersOfTwo’ ‘S4Vectors:::quick_unlist’
    ‘S4Vectors:::selectSome’
    See the note in ?`:::` about the use of this operator.

❯ checking foreign function calls ... NOTE
  Registration problems:
    symbol ‘func’ not in namespace:
     .Call(func, .extptr(file), regions, flag, simpleCigar, tagFilter, 
         mapqFilter, ...)
    symbol ‘func’ not in namespace:
     .Call(func, .extptr(file), regions, tmpl)
    symbol ‘func’ not in namespace:
     .Call(func, file, dest)
  See chapter ‘System and foreign language interfaces’ in the ‘Writing R
  Extensions’ manual.

❯ checking for GNU extensions in Makefiles ... NOTE
  GNU make is a SystemRequirements.

Thanks, and let me know if you have any questions!

Best, Brian

mtmorgan commented 1 year ago

Hi @bschilder thanks for your work on this.

I don't think Rproj files belong in git -- these are user preferences rather than a part of the package; after all I don't include .emacs ... Also for instance if I read this correctly something like NumSpacesForTab: 2 is not consistent with the 4-space indentation in the package (and more generally bioconductor core?) programming style, the package doesn't use devtools paradigms, etc. So I'm not sure that this is adding value.

I'm concerned about the support costs of rworkflows, which I am not familiar with and which would fall on maintainer shoulders if your interests were to migrate elsewhere (the first commit to Rsamtools in 2009...).

I'm also not confident that the workflow is 'doing the right thing' and will continue to do the right thing, with respect to versioning. For instance the Linux distribution appears to use R-devel and bioconductor/bioconductor_docker:devel, which is correct for the current and next Bioconductor release cycle, but will be incorrect for the cycle after that (bioconductor has a twice yearly release cycle whereas R has an annual cycle; bioconductor devel anticipates the user environment when it becomes release, so bioc-devel only builds on R-devel 1/2 the time). Likewise macOS and Windows seem to be using R 'latest' but bioc-devel, which is not how Bioconductor distributes packages (or perhaps 'latest' is a synonym for R-devel, but then this just highlights the implicit support costs...).

I see also 'as_cran' but Bioconductor does not check packages that way, using instead the R environment defined at http://bioconductor.org/checkResults/release/bioc-LATEST/Renviron.bioc.

The issue

❯ checking foreign function calls ... NOTE
  Registration problems:
    symbol ‘func’ not in namespace:
     .Call(func, .extptr(file), regions, flag, simpleCigar, tagFilter, 
         mapqFilter, ...)

seems to be incorrect -- the pattern is foo <- function(func, ...) ... .Call(func, ...). The other issues were already known and / or rationalized from the Bioconductor build report.

bschilder commented 1 year ago

Thanks for all the thoughtful comments @mtmorgan !

Hi @bschilder thanks for your work on this.

I don't think Rproj files belong in git -- these are user preferences rather than a part of the package; after all I don't include .emacs ... Also for instance if I read this correctly something like NumSpacesForTab: 2 is not consistent with the 4-space indentation in the package (and more generally bioconductor core?) programming style, the package doesn't use devtools paradigms, etc. So I'm not sure that this is adding value.

I agree, this was added by mistake since I assumed you already had those file types added to your .gitignore. I can fix this.

I'm concerned about the support costs of rworkflows, which I am not familiar with and which would fall on maintainer shoulders if your interests were to migrate elsewhere (the first commit to Rsamtools in 2009...).

True, but that's kind of the point of rworkflows. Rather than having a static workflow that each user must maintain, it calls a centralized action that our Neurogenomics Lab maintains. To me, this is even less risky than having a Suggest dependency (with a lot of benefits), as it does not affect whether your package gets distributed to Bioc, it is simply an intermediate check for developers to get immediate feedback on whether they're code is working, rather than waiting several days to see whether things have worked on Bioc (and at that point, any bugs are now being distributed to Bioc users).

I do understand the fear, but the same is true for any package or software that is maintained on a volunteer basis, including Rsamtools itself!

If it's any consolation, I currently maintain >25 R packages that all depend on rworkflows. So I have pretty good incentive to keep maintaining it ;) Plus, the goal is to grow the popularity of rworkflows such that it benefits from a lot of community involvement.

I'm also not confident that the workflow is 'doing the right thing' and will continue to do the right thing, with respect to versioning. For instance the Linux distribution appears to use R-devel and bioconductor/bioconductor_docker:devel, which is correct for the current and next Bioconductor release cycle, but will be incorrect for the cycle after that (bioconductor has a twice yearly release cycle whereas R has an annual cycle; bioconductor devel anticipates the user environment when it becomes release, so bioc-devel only builds on R-devel 1/2 the time). Likewise macOS and Windows seem to be using R 'latest' but bioc-devel, which is not how Bioconductor distributes packages.

I'm open to suggestions regarding the 'right thing' to do!

As a developer of a handful of Bioc packages myself, I'm very much interested in making rworkflows fully compatible with the Bioc schedule.

My logic with having Linux use the devel version was that it would allow developers to be ahead of the curve, whereas Mac/Windows reflects the latest release.

These are of course all things you have full control over within your yaml file. I've done this step for you, but the R function rworkflows::use_workflow() has a bunch of arguments you can adjust to generate the workflow script with whatever your preferences are. That said, I'd like to programmatically extract the 'correct' versions for R and Bioc such that they match the Bioc release schedule, so that users never have to worry about that. I've added it as an Issue here: https://github.com/neurogenomics/rworkflows/issues/33

I see also 'as_cran' but Bioconductor does not check packages that way, using instead the R environment defined at http://bioconductor.org/checkResults/release/bioc-LATEST/Renviron.bioc.

You're welcome to change any of the rworkflows parameters as you see fit. To change thing, simply set as_cran: {{ false }} in the yaml file. I'd also be happy to change this for you.

The issue

❯ checking foreign function calls ... NOTE
  Registration problems:
    symbol ‘func’ not in namespace:
     .Call(func, .extptr(file), regions, flag, simpleCigar, tagFilter, 
         mapqFilter, ...)

seems to be incorrect -- the pattern is foo <- function(func, ...) ... .Call(func, ...). The other issues were already known and / or rationalized from the Bioconductor build report.

I see, perhaps there's a setting in rcmdcheck I can adjust. Still, it's just a Note so it shouldn't affect the ability to rworkflows to pass.

My larger concern were the Warning messages. I'm not familiar with C code myself, but perhaps you'll have some insights into this.

❯ checking compiled code ... WARNING
  File ‘Rsamtools/libs/Rsamtools.so’:
    Found ‘___stderrp’, possibly from ‘stderr’ (C)
      Objects: ‘bam_sort.o’, ‘bamfile.o’, ‘bedidx.o’, ‘sam_opts.o’,
        ‘sam_utils.o’, ‘samtools_patch.o’
    Found ‘___stdoutp’, possibly from ‘stdout’ (C)
      Objects: ‘bam_sort.o’, ‘sam_utils.o’
    Found ‘_abort’, possibly from ‘abort’ (C)
      Object: ‘bam_sort.o’
  File ‘Rsamtools/libs/Rsamtools.so’:
    Found no call to: ‘R_useDynamicSymbols’

Thanks again, Brian

bschilder commented 1 year ago

Hi @mtmorgan, I've gone through and implemented many of your helpful suggestions.

  1. Added ".Rproj" to the .gitignore* (note the wildcard).
  2. Synchronise Bioc / R versions: https://github.com/neurogenomics/rworkflows/issues/33
  3. Set as_cran to false in the rworkflows workflow file within GenomicRanges.

I hope this helps alleviate at least some of the main concerns! Let me know if there's anything else I can do to help.

bschilder commented 1 year ago

rcmdcheck

Regarding these warnings, trying to find which parameters i need to change so that these are ignored (if you can confirm they are spurious warnings):

❯ checking compiled code ... WARNING
  File ‘Rsamtools/libs/Rsamtools.so’:
    Found ‘___stderrp’, possibly from ‘stderr’ (C)
      Objects: ‘bam_sort.o’, ‘bamfile.o’, ‘bedidx.o’, ‘sam_opts.o’,
        ‘sam_utils.o’, ‘samtools_patch.o’
    Found ‘___stdoutp’, possibly from ‘stdout’ (C)
      Objects: ‘bam_sort.o’, ‘sam_utils.o’
    Found ‘_abort’, possibly from ‘abort’ (C)
      Object: ‘bam_sort.o’
  File ‘Rsamtools/libs/Rsamtools.so’:
    Found no call to: ‘R_useDynamicSymbols’

   Compiled code should not call entry points which might terminate R nor
   write to stdout/stderr instead of to the console, nor use Fortran I/O
   nor system RNGs.
   It is good practice to register native routines and to disable symbol
   search.

   See ‘Writing portable packages’ in the ‘Writing R Extensions’ manual.

Potentially related Issues: https://github.com/airoldilab/sgd/issues/44 https://github.com/ropensci/git2r/issues/83 https://github.com/microsoft/LightGBM/issues/1909

I tried these suggested fixes but tools::package_native_routine_registration_skeleton(".") simply yielded a message "no native symbols were extracted". And it looks like the NAMESPACE file already has useDynLib(Rsamtools, .registration=TRUE) as suggested in the post. https://stackoverflow.com/questions/42313373/r-cmd-check-note-found-no-calls-to-r-registerroutines-r-usedynamicsymbols

One way to get around this is simply skipping the rcmdcheck step, and instead rely only on BiocCheck to catch problems.

BiocCheck

BiocCheck issues several errors and warnings .

> BiocCheck::BiocCheck()

─ BiocCheckVersion: 1.34.2
─ BiocVersion: 3.16
─ Package: Rsamtools
─ PackageVersion: 2.16.1
─ sourceDir: /Users/schilder/Desktop/forks/Rsamtools
─ installDir: /var/folders/zq/h7mtybc533b1qzkys_ttgpth0000gn/T//RtmplVBUQa/file1794b3d48fbd8
─ BiocCheckDir: /Users/schilder/Desktop/forks/Rsamtools.BiocCheck
─ platform: unix
─ isTarBall: FALSE

* Installing package...
* Checking for deprecated package usage...
* Checking for remote package usage...
* Checking for 'LazyData: true' usage...
* Checking version number...
* Checking version number validity...
* Checking R version dependency...
    * NOTE: Update R version dependency from 3.5.0 to 4.2.0.
* Checking package size...
      Skipped... only checked on source tarball
* Checking individual file sizes...
    * WARNING: The following files are over 5MB in size:
      '.git/objects/pack/pack-6e2cbf0bd241b8744672adb07f33834ac8daa212.pack'
* Checking biocViews...
* Checking that biocViews are present...
* Checking package type based on biocViews...
    Software
* Checking for non-trivial biocViews...
* Checking that biocViews come from the same category...
* Checking biocViews validity...
* Checking for recommended biocViews...
    * NOTE: Consider adding these automatically suggested biocViews: ATACSeq,
      SequenceMatching, VariantAnnotation, MultipleSequenceAlignment
Search 'biocViews' at https://contributions.bioconductor.org
* Checking build system compatibility...
* Checking for blank lines in DESCRIPTION...
* Checking if DESCRIPTION is well formatted...
* Checking for proper Description: field...
    * NOTE: The Description field in the DESCRIPTION is made up by less than 3
      sentences. Please consider expanding this field, and structure it as a
      full paragraph
* Checking for whitespace in DESCRIPTION field names...
* Checking that Package field matches directory/tarball name...
* Checking for Version field...
* Checking for valid maintainer...
    * ERROR: Do not use Author/Maintainer fields. Use Authors@R.
    * ERROR: Remove Maintainer field. Use Authors@R [cre] designation.
* Checking License: for restrictive use...
* Checking for pinned package versions...
* Checking DESCRIPTION/NAMESPACE consistency...
* Checking .Rbuildignore...
* Checking for stray BiocCheck output folders...
* Checking vignette directory...
Warning in remind_sweave(if (in.file) input, sweave_lines) :
  It seems you are using the Sweave-specific syntax in line(s) 9, 105, 355, 447, 504; you may need Sweave2knitr("/Users/schilder/Desktop/forks/Rsamtools/vignettes//Rsamtools-Overview.Rnw") to convert it to knitr
Warning in remind_sweave(if (in.file) input, sweave_lines) :
  It seems you are using the Sweave-specific syntax in line(s) 9, 105, 355, 447, 504; you may need Sweave2knitr("/Users/schilder/Desktop/forks/Rsamtools/vignettes//Rsamtools-Overview.Rnw") to convert it to knitr
Warning in remind_sweave(if (in.file) input, sweave_lines) :
  It seems you are using the Sweave-specific syntax in line(s) 9, 105, 355, 447, 504; you may need Sweave2knitr("/Users/schilder/Desktop/forks/Rsamtools/vignettes//Rsamtools-Overview.Rnw") to convert it to knitr
Warning in remind_sweave(if (in.file) input, sweave_lines) :
  It seems you are using the Sweave-specific syntax in line(s) 9, 105, 355, 447, 504; you may need Sweave2knitr("/Users/schilder/Desktop/forks/Rsamtools/vignettes//Rsamtools-Overview.Rnw") to convert it to knitr
Warning in remind_sweave(if (in.file) input, sweave_lines) :
  It seems you are using the Sweave-specific syntax in line(s) 9, 105, 355, 447, 504; you may need Sweave2knitr("/Users/schilder/Desktop/forks/Rsamtools/vignettes//Rsamtools-Overview.Rnw") to convert it to knitr
* Checking package installation calls in R code...
* Checking for library/require of Rsamtools...
* Checking coding practice...
    * NOTE: Avoid sapply(); use vapply()
    * NOTE: Avoid the use of 'paste' in condition signals
* Checking parsed R code in R directory, examples, vignettes...
    * NOTE: Avoid '<<-' if possible (found 2 times)
* Checking function lengths...
    * NOTE: The recommended function length is 50 lines or less. There are 411
      functions greater than 50 lines.
* Checking man page documentation...
    * WARNING: Add non-empty \value sections to the following man pages:
    * ERROR: At least 80% of man pages documenting exported objects must have
      runnable examples.
    * NOTE: Usage of dontrun{} / donttest{} tags found in man page examples. 11%
      of man pages use at least one of these tags.
    * NOTE: Use donttest{} instead of dontrun{}.
* Checking package NEWS...
* Checking unit tests...
* Checking skip_on_bioc() in tests...
* Checking formatting of DESCRIPTION, NAMESPACE, man pages, R source, and
  vignette source...
    * NOTE: Consider shorter lines; 21 lines (0%) are > 80 characters long.
    * NOTE: Consider 4 spaces instead of tabs; 70 lines (1%) contain tabs.
    * NOTE: Consider multiples of 4 spaces for line indents; 1633 lines (18%)
      are not.
    See https://contributions.bioconductor.org/r-code.html
    See styler package: https://cran.r-project.org/package=styler as described
      in the BiocCheck vignette.
* Checking if package already exists in CRAN...
'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for
details

replacement repositories:
    CRAN: https://cran.rstudio.com/

* Checking for bioc-devel mailing list subscription...
    * NOTE: Cannot determine whether maintainer is subscribed to the Bioc-Devel
      mailing list (requires admin credentials). Subscribe here:
      https://stat.ethz.ch/mailman/listinfo/bioc-devel
* Checking for support site registration...
    Maintainer email is ok.

─ BiocCheck results ──
1 ERRORS | 2 WARNINGS | 14 NOTES

See the Rsamtools.BiocCheck folder and run
    browseVignettes(package = 'BiocCheck')
for details.

Author and Maintainer fields

I've gone ahead and reformatted these so that they're compliant with the latest release of BiocCheck, by collapsing them into one "Authors@R" field.

"At least 80% of man pages documenting exported objects must have runnable examples."

I'll leave this one to you @mtmorgan so you can decide on what the examples should be, and for which functions they're included in.

"Add non-empty \value sections to the following man pages:"

man/ApplyPileupsParam-class.Rd, man/BamFile-class.Rd, man/BamViews-class.Rd, man/BcfFile-class.Rd, man/FaFile-class.Rd, man/PileupFiles-class.Rd, man/RsamtoolsFile-class.Rd, man/RsamtoolsFileList-class.Rd, man/ScanBamParam-class.Rd, man/ScanBcfParam-class.Rd, man/TabixFile-class.Rd

I'll also leave this to you as you know these functions best.

Hope this helps!, Brian

bschilder commented 1 year ago

Hi @mtmorgan, just wanted to check in and see whether you might want to merge this PR.

To recap:

Hope that helps clear things up, happy to answer any questions!

mtmorgan commented 1 year ago

I'll leave this for @hpages or @vjcitn to resolve.

vjcitn commented 1 year ago

We will look at it.

hpages commented 1 year ago

Hi @bschilder,

A few minor things that caught my attention:

Thanks!

hpages commented 1 year ago

Also how is the addition of a .gitignore file related to the "rworkflows" functionality? This is distracting. If you want to make the case for adding this file please open an issue and/or submit a separate PR. We already discussed this here :wink:

In the same spirit, the new .Rbuildignore file should only have the ^.github$ line. AFAIK the other lines are not related to the "rworkflows" functionality either.

Again, PRs should stick to the matter and not try to squeeze random/unrelated changes. Thanks!

hpages commented 1 year ago

Finally I'm not sure I understand the following lines:

run_bioccheck: ${{ true }}
run_rcmdcheck: ${{ false }}

Does it mean that the workflow will run BiocCheck but not R CMD check?

Same with the following lines:

has_testthat: ${{ true }}
has_runit: ${{ false }}

Shouldn't this be the otherway around?

bschilder commented 1 year ago
  • It looks like the PR re-introduces the Sweave vignette (Rsamtools-Overview.Rnw) which got removed from the package in April 2023.

Likely due to the lag between when I first made the PR (Dec 23, 2022) and now. I merged the latest version of Rsamtools into my fork but I guess some files were kept by accident. Easy fix, just removed it.

  • The new entry in NEWS says CHANGES IN VERSION 2.16.1 but I don't see a version bump,

and the version of the package is 2.17.0 in devel.

Again due to the lag, and apparently the fact that the NEWS file isn't being regularly updated? In any case, just updated the version in both files.

  • Please don't add https://github.com/Bioconductor/Rsamtools to the URL field in the DESCRIPTION file. The BugReports field already points people to the issue tracker on GitHub. Bioconductor prefers to make the package landing page (https://bioconductor.org/packages/Rsamtools) the primary URL, and not advertize the URL of the Github repo so much because this encourages people to install the package from there (which is bad) or to use the issue tracker to ask questions about how to use the software (which is also bad).

Removed.

Also how is the addition of a .gitignore file related to the "rworkflows" functionality? This is distracting. If you want to make the case for adding this file please open an issue and/or submit a separate PR. We already discussed this here 😉

Simply wanted to make sure I wasn't pushing any extra files that I didn't intend to (e.g macs are constantly generating .DS_Store files). But happy to remove this and remove any other residual files manually.

In the same spirit, the new .Rbuildignore file should only have the ^.github$ line. AFAIK the other lines are not related to the "rworkflows" functionality either.

Not quite. These are directly related to rworkflows, as the files I listed there can be generated in the course of GHA checks on certain platforms (namely Windows). I figured I'd add them to save you from scratching your head when things fail later.

bschilder commented 1 year ago

Finally I'm not sure I understand the following lines:

run_bioccheck: ${{ true }}
run_rcmdcheck: ${{ false }}

Does it mean that the workflow will run BiocCheck but not R CMD check?

Yup, but you can change it to whatever configuration your prefer! Just let me know which you would like and I'd be happy to do it for you.

Same with the following lines:

has_testthat: ${{ true }}
has_runit: ${{ false }}

Shouldn't this be the otherway around?

In your case yes, thanks for the catch! These are simply the defaults. Just fixed this.

That said, outside of this PR I'd highly encourage you to consider adding some more targeted unit tests, rather than relying solely on the generic BiocGenerics:::testPackage function which doesn't cover most scenarios from my understanding. But perhaps something to tackle another day.

If you're unsure about any of the parameters, you can always read about each of them in more detail here, or use: ?rworkflows::use_workflow

hpages commented 1 year ago

I'm not convinced that these lines in .Rbuildignore are related to the "rworkflows" functionality:

^.*\.Rproj$
^\.Rproj\.user$

^migration_notes.md$

so maybe at least those can be removed?

Not sure about the other ones:

node_modules$
package-lock\.json$
package\.json$

but I don't see them here so I'm confused!

I would suggest that you set run_rcmdcheck: to ${{ true }} in the PR. Isn't running R CMD check the main purpose of "rworkflows"? Even better, the factory settings for "rworkflows" should focus on trying to reproduce the build/check results produced by the official daily builds, to avoid confusion for developers and users of the package. For this purpose I would also suggest that you set run_bioccheck: to ${{ false }} in the PR.

The potential for confusion is real and should not be underestimated. Just earlier this morning @vjcitn merged your other "rworkflows" PR for VariantAnnotation (https://github.com/Bioconductor/VariantAnnotation/pull/73), but then a couple hours later Vince had to revert the merge because check was failing, probably because the yaml file also had run_bioccheck: set to ${{ true }}.

As mentioned already here, BiocCheck has a tendency to produce some false positives which is why we don't run it as part of our daily builds yet.

About the NEWS file. Some Bioconductor packages adopt the "one big section per Bioconductor release" approach. That means that all the changes that happened in the package between BioC 3.17 and BioC 3.18 are documented in a big section that starts with:

CHANGES IN VERSION X.Y
----------------------

where X.Y is the major version that the package will have at the time of the release (2.18 in the case of Rsamtools). That big section is itself organized in subsections, typically NEW FEATURES, SIGNIFICANT USER-VISIBLE CHANGE, DEPRECATED AND DEFUNCT, and BUG FIXES.

That said, outside of this PR I'd highly encourage you to consider adding some more targeted unit tests, rather than relying solely on the generic BiocGenerics:::testPackage function which doesn't cover most scenarios from my understanding.

Not sure what you mean exactly. Calling BiocGenerics:::testPackage("MyPackage") is how we run the unit tests in the RUnit world. So this would be like saying that test_check("MyPackage") doesn't cover most scenarios in the testthat world. :thinking:

Why add an empty line far down in the NEWS file? (just above the CHANGES IN VERSION 2.10 section). Again let's please avoid polluting the PR with random/unrelated changes. It's easy to spot this kind of pollution by checking the diff here: https://github.com/Bioconductor/Rsamtools/pull/48/files

Thanks

bschilder commented 1 year ago

I'm not convinced that these lines in .Rbuildignore are related to the "rworkflows" functionality:

^.*\.Rproj$
^\.Rproj\.user$

^migration_notes.md$

so maybe at least those can be removed?

Sure, removed.

Not sure about the other ones:

node_modules$
package-lock\.json$
package\.json$

but I don't see them here so I'm confused!

The issue is intermittent, haven't figured out the logic to it yet (partly due to how GHA sets up their VMs, I suspect). Can remove if you like, I'll leave it to you to fix if and and when the errors return.

I would suggest that you set run_rcmdcheck: to ${{ true }} in the PR. Isn't running R CMD check the main purpose of "rworkflows"? Even better, the factory settings for "rworkflows" should focus on trying to reproduce the build/check results produced by the official daily builds, to avoid confusion for developers and users of the package. For this purpose I would also suggest that you set run_bioccheck: to ${{ false }} in the PR.

Done.

The potential for confusion is real and should not be underestimated. Just earlier this morning @vjcitn merged your other "rworkflows" PR for VariantAnnotation (Bioconductor/VariantAnnotation#73), but then a couple hours later Vince had to revert the merge because check was failing, probably because the yaml file also had run_bioccheck: set to ${{ true }}. As mentioned already here, BiocCheck has a tendency to produce some false positives which is why we don't run it as part of our daily builds yet.

I don't have any issues with BiocCheck atm. But if you do, I recommend take it up with the BiocCheck team: https://github.com/Bioconductor/BiocCheck

About the NEWS file. Some Bioconductor packages adopt the "one big section per Bioconductor release" approach. That means that all the changes that happened in the package between BioC 3.17 and BioC 3.18 are documented in a big section that starts with:

CHANGES IN VERSION X.Y
----------------------

where X.Y is the major version that the package will have at the time of the release (2.18 in the case of Rsamtools). That big section is itself organized in subsections, typically NEW FEATURES, SIGNIFICANT USER-VISIBLE CHANGE, DEPRECATED AND DEFUNCT, and BUG FIXES.

Up to you how you want to document things, just pointing out the reason why it occurred.

That said, outside of this PR I'd highly encourage you to consider adding some more targeted unit tests, rather than relying solely on the generic BiocGenerics:::testPackage function which doesn't cover most scenarios from my understanding.

Not sure what you mean exactly. Calling BiocGenerics:::testPackage("MyPackage") is how we run the unit tests in the RUnit world. So this would be like saying that test_check("MyPackage") doesn't cover most scenarios in the testthat world. 🤔

We can discuss this in more detail elsewhere.

Why add an empty line far down in the NEWS file? (just above the CHANGES IN VERSION 2.10 section). Again let's please avoid polluting the PR with random/unrelated changes. It's easy to spot this kind of pollution by checking the diff here: https://github.com/Bioconductor/Rsamtools/pull/48/files

Done.

bschilder commented 1 year ago

@hpages if you're interested in learning about the purpose of rworkflows, I'd recommend you take a look at the documentation: https://github.com/neurogenomics/rworkflows/issues

As well as the associated preprint: https://www.researchsquare.com/article/rs-2399015/v1

hpages commented 1 year ago

Thanks for all the changes and the links you provided above.

Why remove the .Rbuildignore file? Don't we need to prevent the .github folder to end up in the source tarball?

Up to you how you want to document things etc...

I don't maintain Rsamtools and didn't choose the style/format of its NEWS file (even though I adopted that style in the packages I maintain). But it's usually on the authors of a PR to do their best effort to follow the style used in the piece of software they are contributing to. Hence the reason I took the time to give you feedback on how NEWS should be formatted.

hpages commented 1 year ago

BTW Bioconductor has moved to using devel instead of master but I just spotted that the yaml file sill uses the latter. The other PRs you've submitted for other Bioconductor packages seem to have the same problem. Thx

bschilder commented 1 year ago

Thanks for all the changes and the links you provided above.

Why remove the .Rbuildignore file? Don't we need to prevent the .github folder to end up in the source tarball?

Sorry, I interpreted your previous messages as wanting to remove the .Rbuildignore file. I'll add it back.

BTW Bioconductor has moved to using devel instead of master but I just spotted that the yaml file sill uses the latter. The other PRs you've submitted for other Bioconductor packages seem to have the same problem. Thx

Thanks for noticing this, thought I had added "devel" but i guess not. I've adjusted this one, and will do so for the others.

PS - It looks like when I forked the repo GitHub automatically must have renamed the branch "master". I'll keep an eye out for this in the future: https://github.com/neurogenomics/Rsamtools/tree/master

hpages commented 1 year ago

It looks like when I forked the repo GitHub automatically must have renamed the branch "master"

Or maybe you forked before we renamed the branch? The master -> devel renaming happened this year (see the various announcements on the bioc-devel mailing lists about that). However it seems that you forked before that.

Any chance you can amend your addition to the NEWS file? All you need to do is replace

CHANGES IN VERSION 2.17.1

with

CHANGES IN VERSION 2.18

Finally can you please explain how the yaml file makes sure that the right version of Bioconductor is used for building and checking the package? I see that the use of the bioconductor/bioconductor_docker:devel container is hardcoded in the yaml file. Note that this would only work for the devel branch of the package, but not for the RELEASE_* branches. What am I missing?

Thanks again, H.

LiNk-NY commented 1 year ago

@hpages Can you elaborate on what false positives you are seeing in BiocCheck? (ideally with reproducible examples) Thanks!

hpages commented 1 year ago

@LiNk-NY Not really the place here to discuss this but it's actually a mix of:

LiNk-NY commented 1 year ago

Thanks, I've moved the discussion to https://github.com/Bioconductor/BiocCheck/issues/197

vjcitn commented 1 year ago

@bschilder I wanted to mention in connection with this PR (and others) that our team has been discussing the proposal to adopt rworkflows for this and other packages. The current thinking is that an intensive multiparty discussion about specific GHA-oriented solutions for Bioconductor packages is needed. We will be in touch as this discussion is scheduled.

bschilder commented 1 year ago

@bschilder I wanted to mention in connection with this PR (and others) that our team has been discussing the proposal to adopt rworkflows for this and other packages. The current thinking is that an intensive multiparty discussion about specific GHA-oriented solutions for Bioconductor packages is needed. We will be in touch as this discussion is scheduled.

@vjcitn I've actually been working on adding some new features to rworkflows to make it even more Bioc friendly. will share v soon! and thanks, looking forward to the discussion

bschilder commented 1 year ago

It looks like when I forked the repo GitHub automatically must have renamed the branch "master"

Or maybe you forked before we renamed the branch? The master -> devel renaming happened this year (see the various announcements on the bioc-devel mailing lists about that). However it seems that you forked before that.

Ah you're right, that makes more sense actually.

Any chance you can amend your addition to the NEWS file? All you need to do is replace

CHANGES IN VERSION 2.17.1

with

CHANGES IN VERSION 2.18

Sure, I'm assuming you want me to bump the version in DESCRIPTION as well?

Finally can you please explain how the yaml file makes sure that the right version of Bioconductor is used for building and checking the package? I see that the use of the bioconductor/bioconductor_docker:devel container is hardcoded in the yaml file. Note that this would only work for the devel branch of the package, but not for the RELEASE_* branches. What am I missing?

I'm not sure I would call thebioconductor/bioconductor_docker:devel parameter "hard coded", in the same way I wouldn't call any of the other parameters in the workflow file hard coded (e.g. run_biocheck: {{ false }}). These are all parameters which the user can change to fit their needs and preferences. So it's up the user which is the "right" version of bioconductor they wish to run CI on for each VM.

Coordinating within VMs

Let's take a closer look:

matrix:
        config:
        - os: ubuntu-latest
          r: devel
          bioc: devel
          cont: bioconductor/bioconductor_docker:devel
          rspm: https://packagemanager.rstudio.com/cran/__linux__/focal/release
        - os: macOS-latest
          r: latest
          bioc: release
        - os: windows-latest
          r: latest
          bioc: release

You'll notice that on the ubuntu-latest VM the versions of r / bioc / cont are all coordinated (ie. they use devel). Similarly, on the VMs macOS-latest and windows-latest the versions of r / bioc / cont are all coordinated (ie. they use latest/release).

Users can indeed change these parameters to suit their needs, but I provide a bit of extra help within the rworkflows::use_workflow function via the rworkflows::construct_runners helper subfunction. The assists users when selecting the VM runner options that make sense together, which is reflected in the example above.

Here is the relevant documentation: https://neurogenomics.github.io/rworkflows/reference/use_workflow.html https://neurogenomics.github.io/rworkflows/reference/construct_runners.html

Specifying runners in the workflows vs. the action

Now, you might ask "Why not just pass a simplified parameter to specify the runners, instead of including the matrix:config explicitly within the workflow?". Well there's a couple reason for this, the main one being that GHA only lets you specify the runner within workflow files, not action scripts. For this reason, I can't simply pass in a parameter named something like: use_runner: ubuntu-latest.

Coordinating across branches

If I understand correctly, there is a second point that you've made regarding the coordination between branches and VM runners.

So for example, if you make pushes to the RELEASE_3.17 branch, you may only want to run VMs using the release version of Bioc. And when making pushes to the 'devel' branch to may only want to run VMs using the devel version of Bioc. In this case, you would simply generate two separate workflow files like so:

(note i'm using the dev version of rworkflows here bc i just edited some of the args to make this even easier.)

remotes::install_github("neurogenomics/rworkflows@dev")

#### Write devel branch workflow ####
v <- "devel"
f1 <- use_workflow(name = paste("rworkflows",v,sep="."), 
                   branches = v, 
                   runners = construct_runners(bioc = v), 
                   preview = TRUE,
                   force_new = TRUE,
                   save_dir = tempdir() # For demo only, use default in practice
                   ) 

#### Write RELEASE_3_17 branch workflow ####
v <- "RELEASE_3_17"
f2 <- use_workflow(name = paste("rworkflows",v,sep="."), 
                   branches = v, 
                   runners = construct_runners(bioc = v), 
                   preview = TRUE,
                   force_new = TRUE,
                   save_dir = tempdir() # For demo only, use default in practice
                   ) 

The above code will generate two workflow files; one for the devel branch and one for the RELEASE_3_17 branch. They will each use the respectively appropriate version of Bioc.

LiNk-NY commented 1 year ago

Quick question: How do you learn that the RSPM / PPM now uses jammy instead of focal? Is this done manually?

Currently, the Bioconductor devel container uses codename jammy and the config should look like:

cont: bioconductor/bioconductor_docker:devel
rspm: https://packagemanager.rstudio.com/cran/__linux__/jammy/release

verified with:

docker run -it bioconductor/bioconductor_docker:devel bash
root@:/# lsb_release -a
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:    22.04
Codename:   jammy
bschilder commented 1 year ago

Quick question: How do you learn that the RSPM / PPM now uses jammy instead of focal? Is this done manually?

Currently, the Bioconductor devel container uses codename jammy and the config should look like:

cont: bioconductor/bioconductor_docker:devel
rspm: https://packagemanager.rstudio.com/cran/__linux__/jammy/release

verified with:

docker run -it bioconductor/bioconductor_docker:devel bash
root@:/# lsb_release -a
Distributor ID:   Ubuntu
Description:  Ubuntu 22.04.3 LTS
Release:  22.04
Codename: jammy

Oh interesting! I didn't realise this was an element that was liable to change. I can add a function to use_workflows that infers the correct codename.

Made a dedicated Issue for it here: https://github.com/neurogenomics/rworkflows/issues/73

hpages commented 1 year ago

Sure, I'm assuming you want me to bump the version in DESCRIPTION as well?

Nope. You've already bumped it to 2.17.1, which is how it should be. (Remember that the core team will bump the package version to 2.18 at release time.)

So for example, if you make pushes to the RELEASE_3.17 branch, you may only want to run VMs using the release version of Bioc. And when making pushes to the 'devel' branch to may only want to run VMs using the devel version of Bioc.

Exactly. Using the right VM for the right branch is very important. It's important to realize that trying to build/check a specific branch (e.g. RELEASE_3_17) of a Bioconductor package on the wrong container (e.g. bioconductor/bioconductor_docker:devel) produces meaningless results. I was hoping that this would be somehow achievable via a single static workflow file, but, based on your explanations, it seems that one would need to edit the workflow file in the new branch after each release. This is a concern that I wish could be addressed.

In the meantime, since your PR is against the devel branch, and the workflow file in this PR uses bioconductor/bioconductor_docker:devel, I still don't understand why RELEASE_** is listed under branches:

  push:
    branches:
    - devel
    - RELEASE_**
  pull_request:
    branches:
    - devel
    - RELEASE_**

What am I missing?

Thanks, H.

bschilder commented 1 year ago

Sure, I'm assuming you want me to bump the version in DESCRIPTION as well?

Nope. You've already bumped it to 2.17.1, which is how it should be. (Remember that the core team will bump the package version to 2.18 at release time.)

Ok, thanks for clarifying .

So for example, if you make pushes to the RELEASE_3.17 branch, you may only want to run VMs using the release version of Bioc. And when making pushes to the 'devel' branch to may only want to run VMs using the devel version of Bioc.

Exactly. Using the right VM for the right branch is very important. It's important to realize that trying to build/check a specific branch (e.g. RELEASE_3_17) of a Bioconductor package on the wrong container (e.g. bioconductor/bioconductor_docker:devel) produces meaningless results. I was hoping that this would be somehow achievable via a single static workflow file, but, based on your explanations, it seems that one would need to edit the workflow file in the new branch after each release. This is a concern that I wish could be addressed.

In the meantime, since your PR is against the devel branch, and the workflow file in this PR uses bioconductor/bioconductor_docker:devel, I still don't understand why RELEASE_** is listed under branches:

I haven't made the changes yet. Was waiting to hear whether you'd approve of the multi-workflow approach. Will modify now.

bschilder commented 1 year ago

I haven't made the changes yet. Was waiting to hear whether you'd approve of the multi-workflow approach. Will modify now.

Done!

bschilder commented 11 months ago

Replying to @hpages comments here as requested:

As @vjcitn mentioned here, adoption of rworkflows for core Bioconductor packages, as well as its official endorsement for Bioconductor packages in general, would still require a multiparty discussion that goes beyond this particular PR.

Sure! Though that was proposed back in September so I think it would be good to figure out a concrete time to have that discussion. Otherwise, I think it's liable to fall off the radar indefinitely (understandably, we're all quite busy).

Perhaps the next TAB meeting would be an appropriate time? @vjcitn I realise there's a lot to discuss in these meetings, so if it's too much to pack into TAB perhaps a separate meeting could be scheduled?

I'd also argue using a tool in a couple of packages doesn't necessarily amount to "official Bioconductor-wide endorsement" of that tool, whether the tool is rworkflows or otherwise. That, to my mind, is a separate issue from this PR.

Personally, I still share the same concern as @mtmorgan about the workflow "doing the right thing", and continuing to do the right thing all year round, with respect to versioning. See Martin's comment here for the details.

Fair enough, but if you'd like some input to the direction of rworkflows, you're always more than welcome to join the monthly Cloud Methods Working Group meetings. Just sent you an invite to the dedicated Slack channel in case you'd like to join the conversation.

You can also always propose any points of improvement as an Issue or a PR on the rworkflows repo.

Conclusions

Regardless, as I've mentioned before I do think there's a strong argument to have some kind of GitHub-based CI vs. none. I think rworkflows via this PR is the easiest avenue to that, and if you change your mind later it's trivial to remove it (just delete the workflow file).

So even if you're not a fan of rworkflows, I highly recommend you implement something sooner than later (e.g. biocthis, custom workflow scripts, etc.). As we've seen here, GH-based CI can identify issues before they're distributed to Bioc.

Best, Brian

hpages commented 11 months ago

Personally, I still share the same concern as @mtmorgan about the workflow "doing the right thing", and continuing to do the right thing all year round, with respect to versioning. See Martin's comment here for the details.

Fair enough, but if you'd like some input to the direction of rworkflows, you're always more than welcome to join the monthly Cloud Methods Working Group meetings. Just sent you an invite to the dedicated Slack channel in case you'd like to join the conversation.

I see that you added me to this channel. I didn't receive the invite.

You can also always propose any points of improvement as an Issue or a PR on the rworkflows repo.

All I want to know is if this is something you're planning to fix. This is a major blocking issue IMO and it's been mentioned and discussed many times.

bschilder commented 11 months ago

All I want to know is if this is something you're planning to fix. This is a major blocking issue IMO and it's been mentioned and discussed many times.

@hpages To my knowledge, all changes you requested were completed a while ago. Is there anything else you'd like me to address?

hpages commented 11 months ago

@hpages To my knowledge, all changes you requested were completed a while ago. Is there anything else you'd like me to address?

Now I'm confused. How did you address the versioning concern? You still have:

          r: devel
          bioc: devel
          cont: bioconductor/bioconductor_docker:devel

in the config file, which means that R devel is going to be installed and used on top of the bioconductor_docker:devel container. As explained on several occasions in many different places, this will only be good half of the year, but not all year round. Also why would you do that if the container already has the right version of R?

What's even more confusing is that, one week ago, when I reminded you about this (see above), you seemed to aknowledge that this was still an issue (your answer was "Fair enough etc.."), and you even invited me to discuss it in the monthly Cloud Methods Working Group meetings. But now you say that all the changes I requested were completed... :thinking: ?!!

Maybe we should just close this for now and wait until things have been discussed and sorted out by the Cloud Methods Working Group. It was probably premature for you to submit a PR for something like this in the first place given that there are several alternatives around that should be put on the table by the working group. Once the working group makes a decision, they can always make an announcement on the bioc-devel mailing list and/or the community-bioc Slack.