feat: Unit Testing and Automation

ahl27 commented 4 weeks ago

Summary

Adds 1,322 unit tests
- can be run locally with testthat::test_dir("tests/testthat")
- total coverage is just over 60%
Adds new GitHub workflow to auto-run checks on unit tests
Minor changes throughout some files

New Workflow

This PR adds a GitHub Action workflow to automatically run unit tests and report overall coverage. The workflow starts a Linux VM, then evaluates test coverage on the default Biostrings branch as well as on the package with proposed changes in the PR. On completion, the action adds a status to the "Summary" pane of the check with the following statistics:

results of unit tests evaluated on the proposed code
a list of test suites (if any) that failed or threw warnings
(if no tests failed): A list of files (if any) that have lost test coverage relative to the base repository
(if no tests failed): An expandable list with coverage and change in coverage for each file

This result is also uploaded as a GitHub-flavored Markdown file. Total runtime of the workflow is 9-12min. An example workflow can be seen here.

For reasons I don't entirely understand, the coverage reported here is sometimes different than what's reported if you upload the exact same source data to Codecov.io. Maybe there's some difference in how Codecov calculates percent coverage. Codecov.io isn't used because licensing makes things tricky (and potentially costly).

Note: it is possible to also add the results as a comment to the PR that triggered the workflow, but this requires a GitHub token with pull-request: write permissions, which I'm hesitant to include on a public repository.

Note 2: this workflow will not run until a version of it is present on Biostrings/devel, so the results of the workflow won't be visible until this PR is merged. This workflow can be activated by merging an empty file .github/workflows/test-coverage.yaml to Biostrings/devel.

The following files are currently ignored by the checker:

R/AMINO_ACID_CODE.R
R/GENETIC_CODE.R
R/zzz.R
R/IUPAC_CODE_MAP.R
R/getSeq.R

Minor changes to files

warning in translate.R:16 has been elevated to an error because AA_ALPHABET is now enforced.
error messages for letter.R are standardized, some additional error checking is added.
mask was scheduled to be deprecated in Biostrings 2.9. This function now throws a warning when it is used and will be removed in a subsequent release.
man/XStringSetList-class.Rd previously said that "DNAStringSetList and AAStringSetList are the only constructors", but the current version of Biostrings includes BStringSetList and RNAStringSetList constructors. This has been updated.
Some tests were included inline in replaceAt.R wrapped in if(FALSE){...} brackets. These have been moved to the unit tests, and performance/timing tests have been removed.
replaceAmbiguities now checks to ensure input is one of DNAString, RNAString, or the XStringSet/XStringSetList analogs. This previously caused some odd behavior with input of type AAString.
an erroneous extra (useless) argument to make_AA_COLORED_LETTERS has been removed.
a warning in XStringViews-class.R that was supposed to be suppressed in Bioc 2.12 has been suppressed.
tests in inst/unitTests have been ported to testthat when possible. Accordingly, run_unitTests.R has been removed.
Some files have had trailing whitespace removed.

Outstanding changes prior to merging this PR

potentially remove tests in inst/unitTests
push dummy workflow to devel to ensure this workflow runs as expected
Add more tests to other files:
- matchPWM.R (more comprehensive than current)

Planned Future Changes

mask warning can be elevated to an error in Bioc 3.20, and then removed in 3.21.
113
Add BiocCheck / R CMD CHECK workflow, either as part of this one or (more likely) as its own workflow.
Add tests to the files I'm less familiar with:
- matchProbePair.R
- dinucleotideFrequencyTest.R (maybe better tests than the simple ones I included)

ahl27 commented 4 weeks ago

Screenshot of GitHub Action:

Expanded table:

Table from the above action updated with most recent results (click me to expand!)

```diff @@ Total Coverage @@ =============================================== + Total Coverage 60.21% +32.7% =============================================== @@ R/... Files @@ =============================================== + chartr.R 77.78% +77.8% + coloring.R 94.87% +94.9% + dinucleotideFrequencyTest.R 39.81% +39.8% + findPalindromes.R 52.50% +12.5% + injectHardMask.R 80.00% +80.0% + letter.R 95.00% +95.0% + letterFrequency.R 44.94% +28.0% + lowlevel-matching.R 27.14% +15.2% + MaskedXString-class.R 73.24% +50.7% + maskMotif.R 52.63% +2.6% + match-utils.R 39.29% +14.3% + matchLRPatterns.R 100.00% +100.0% + matchPattern.R 97.22% +86.1% + matchPDict.R 93.71% +87.4% matchProbePair.R 0.00% +0.0% + matchprobes.R 100.00% +100.0% + matchPWM.R 39.13% +39.1% + MIndex-class.R 61.43% +45.7% + misc.R 88.89% +88.9% moved_to_pwalign.R 0.00% +0.0% - MultipleAlignment.R 0.00% -59.0% needwunsQS.R 0.00% +0.0% + padAndClip.R 87.10% +87.1% + PDict-class.R 76.92% +44.1% pmatchPattern.R 0.00% +0.0% + QualityScaledXStringSet.R 80.49% +34.1% + replaceAt.R 90.62% +90.6% + replaceLetterAt.R 66.67% +66.7% + reverseComplement.R 92.86% +92.9% + seqtype.R 94.74% +22.8% SparseList-class.R 0.00% +0.0% + strsplit-methods.R 100.00% +90.0% + toComplex.R 92.31% +92.3% + translate.R 100.00% +100.0% + trimLRPatterns.R 100.00% +100.0% + utils.R 100.00% +78.6% + xscat.R 88.24% +88.2% + XString-class.R 68.28% +22.8% + XStringCodec-class.R 86.00% +60.0% XStringPartialMatches-class.R 0.00% +0.0% + XStringQuality-class.R 67.07% +40.2% + XStringSet-class.R 65.84% +18.0% + XStringSet-comparison.R 100.00% +100.0% + XStringSet-io.R 77.98% +49.5% + XStringSetList-class.R 87.50% +46.9% + XStringViews-class.R 36.00% +26.1% =============================================== @@ src/... Files @@ =============================================== BAB_class.c 97.06% +0.0% BitMatrix.c 15.48% +0.0% find_palindromes.c 96.77% +0.0% gtestsim.c 0.00% +0.0% + inject_code.c 90.00% +90.0% + letter_frequency.c 54.85% +30.6% + lowlevel_matching.c 62.86% +53.5% + match_pattern_boyermoore.c 63.82% +63.8% + match_pattern_indels.c 76.60% +76.6% + match_pattern_shiftor.c 95.16% +10.2% + match_pattern.c 95.35% +71.8% + match_pdict_ACtree2.c 49.59% +0.9% + match_pdict_Twobit.c 65.79% +65.8% + match_pdict_utils.c 48.87% +8.4% + match_pdict.c 62.00% +48.0% match_PWM.c 0.00% +0.0% + match_reporting.c 71.96% +37.9% + matchprobes.c 67.57% +67.6% + MIndex_class.c 13.71% +13.7% pmatchPattern.c 0.00% +0.0% + PreprocessedTB_class.c 85.71% +8.6% R_init_Biostrings.c 100.00% +0.0% + read_fasta_files.c 80.53% +4.5% + read_fastq_files.c 84.08% +2.2% + replace_letter_at.c 60.61% +60.6% + replaceAt.c 81.51% +81.5% RoSeqs_utils.c 100.00% +0.0% SparseList_utils.c 0.00% +0.0% + strutils.c 70.97% +71.0% + translate.c 88.07% +88.1% + unstrsplit_methods.c 97.56% +97.6% + utils.c 80.61% +59.2% + xscat.c 92.96% +93.0% XString_class.c 51.58% +0.0% + XStringSet_class.c 93.42% +2.6% + XStringSetList_class.c 100.00% +100.0% =============================================== ```

ahl27 commented 4 weeks ago

When tests fail

Coverage is only calculated when no tests fail. When one or more tests fail or throw warnings, the summary looks like this:

This workflow result can be viewed here. It was created using tests that were artificially designed to fail/throw warnings, which have now been removed.

Note: coverage can be calculated if some tests throw warnings, but none fail. In this case, this table will be displayed along with coverage information.

More information on the tests that failed/warned is available in the workflow itself:

Bioconductor / Biostrings