Closed FreekManders closed 4 years ago
Seems like there is a good opportunity for a formal unit test here.
Here are some test ids and a vector indicating whether they are valid or not
orcid <- c(
"0000-0001-6197-3471",
"0000-0001-6197-347X",
"0000-0001-6197-34XX",
"",
NA_character_
)
valid <- c(TRUE, TRUE, FALSE, FALSE, FALSE)
Here's the current regular expression and test, using the RUnit package (used by BiocCheck, in the inst/unitTests
directory)
re <- "^[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}$"
checkIdentical(valid, grepl(re, orcid))
The test fails (good!) Here's the proposed patch
re- "^[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9X]{4}$"
checkIdentical(valid, grepl(re, orcid))
but it also fails (because it allows the second example). So trying again
re <- "^[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{3}[0-9X]$"
checkIdentical(valid, grepl(re, orcid))
Success! (but maybe I haven't tried hard enough to break it?)
And then a more compact variant
re <- "^([0-9]{4}-){3}[0-9]{3}[0-9X]$"
checkIdentical(valid, grepl(re, orcid))
More success (but is it readable?)
Is there a more comprehensive suite of possible ORCID ids that trip up the current successes? Is there something more readable than a regular expression test?
The test should be incorporated into the inst/unitTests directory, but to do so I think it actually helps to define in the R/ directory a little helper function
.check_ORCID <- function(orcid) {
re <- "^([0-9]{4}-){3}[0-9]{3}[0-9X]$"
grepl(re, orcid)
}
and update the code in the main BiocCheck function to call that. Then the test is easy to implement without having to go through some of the contortions in other parts of the tests files, e.g., no need to create a 'mock' package....
Yes, that's better. I didn't think about cases where there are two X's. I do think the more compact version is less readable. The longer version better shows the structure of the ID. Shall I implement these changes or do you want to do this?
Would be great if you wanted to implement the changes, @FreekManders
I added the changes. I had to clarify the namespace in the unit test. I guess that is because I'm testing a non-exported function, so it shouldn't be a problem.
The ORCID ID check didn't accept valid IDs ending with an X. This is now fixed. IDs that are too long are now also rejected. I updated the NEWS file as requested and bumped the version number of the DESCRIPTION. I didn't update the vignette, because I didn't add any new features.
name: Fix ORCID ID check about: Fix ORCID ID check title: "[PR] Fix ORCID ID check" labels: '' assignees: ''
If you are submitting a pull request to
BiocCheck
please follow the instructions outlined in this presentation. This presentation includes steps for forking, creating a branch for you to work on, and useful related information.Prior to sending the pull request, please verify that
R CMD build
andR CMD check
run without warnings or errors on the latest Bioconductor-devel (currently in May 2020 that would be Bioconductor 3.12) and complete the following steps:The easiest way to obtain a working environment with Bioconductor-devel in your computer is to use the Bioconductor devel docker image as described in detail in the Bioconductor website.
For more questions, please get in touch with the Bioconductor core team through the Bioconductor Slack workspace.