Closed hpages closed 1 year ago
Hi @hpages, can I work on this issue?
@hpages, please how do I check if replace
is set to TRUE
?
I tried if (file.exists(pkgdir) && isTRUE(replace))
and also if (file.exists(pkgdir) && replace == TRUE)
but none worked.
Also, I get this error when I do ?forgeBSgenomeDataPkg()
, after I edited BSgenome/man/BSgenomeForge.Rd
Error Retrieving Help
Error in fetch(key) :
lazy-load database 'Library/Frameworks/Rsesouces/library/BSgenome/help/BSgenome.rdb is corrupt
Hi @Priceless-P ,
Note that forgeBSgenomeDataPkg()
is an S4 generic function so we need to add the new replace
argument to the generic function and all its methods (BSgenome/R/BSgenomeForge.R
defines 3 forgeBSgenomeDataPkg()
methods). I suggest that you add the new argument in 4th position i.e. between destdir
and verbose
. The argument's default value (FALSE
) should be the same everywhere.
please how do I check if
replace
is set toTRUE
?,
First make sure to add a check on the replace
's value supplied by the user. Use isTRUEorFALSE()
for the check. The best place to put this check is at the beginning of the forgeBSgenomeDataPkg()
method for BSgenomeDataPkgSeed objects (the 1st method). That's because, the 2 other methods will ultimately call the 1st method. So place something like this at the beginning of the 1st method (for example, right below the first if
statement):
if (!isTRUEorFALSE(replace))
stop("'replace' must be TRUE or FALSE")
Then, because we've already checked that replace
is TRUE
or FALSE
, we can use it directly in an if
statement e.g. we can do things like
if (replace) {
...
}
or
if (file.exists(pkgdir) && replace) {
...
}
We don't need to wrap it in a call to isTRUE()
and we don't need to do replace == TRUE
either. Note that we would use isTRUE(replace)
if there was the possibility that replace
is not TRUE
or FALSE
e.g. if there was the possibility that it's NA
or a logical vector of length != 1 or any other object that would break if (replace) ...
.
I tried
if (file.exists(pkgdir) && isTRUE(replace))
and alsoif (file.exists(pkgdir) && replace == TRUE)
but none worked.
Please provide more details. What error do you get? Note that it's much easier to discuss things like that if you show the code that others can run to reproduce the error.
Also, I get this error when I do
?forgeBSgenomeDataPkg()
, after I editedBSgenome/man/BSgenomeForge.Rd
Make sure that you reinstall the package and that you try to open the man page in a fresh R session. Also pay attention to the output of R CMD INSTALL
when you reinstall the package. It might be able to detect what's wrong with the man page and emit a NOTE or WARNING about the problem.
Let me know how that goes.
I posted a question 5mins ago then I deleted it. It took me going through my question to figure out what my mistake was 😂. Let me correct my errors and try again. Hopefully, it works this time.
And the ?forgeBSgenomeDataPkg()
works fine now.
It took me going through my question to figure out what my mistake was.
This happens quite often. The simple fact of going thru the process of formulating a precise question forces us to re-examine what we did, and that often leads us to see things that we missed before. So yeah, it's not too surprising that you were able to figure out what your mistake was. :smiley:
2 notes about this:
if (file.exists(pkgdir) && replace) {
unlink(pkgdir, recursive=TRUE)
dir.create(pkgdir)
}
if (dir.exists(pkgdir) && !replace) {
stop("directory", pkgdir,
" exists. Use replace=TRUE to replace it")
}
There's no reason to use file.exists()
in one place and dir.exists()
in the other. This will actually cause problems if a regular file named pkgdir
already exists and replace
is FALSE
, because then the if
clauses in the 2 if
statements will both evaluate to FALSE
, and the call to Biobase::createPackage()
will fail with the old uninformative error message. So you really want to use file.exists()
in both cases.
Also I'd suggest that you reorganize the if
statements like this:
if (file.exists(pkgdir)) {
if (replace) {
unlink(pkgdir, recursive=TRUE)
dir.create(pkgdir)
} else {
stop("directory ", pkgdir, " exists. ",
"Use replace=TRUE to replace it.")
}
}
This makes the code easier to understand. In particular, using nested if
statements (an if else
statement inside an if
statement) makes it immediately obvious that, if pkgdir
exists, the 2 things that can happen are exclusive. With the 2 non-nested if
statements, this is not immediately clear: the if
clauses are more complicated to parse with the eyes, and one needs to pay close attention to them to realize that they are exclusive.
Also I'd suggest that you reorganize the if statements like this: if (file.exists(pkgdir)) { if (replace) { unlink(pkgdir, recursive=TRUE) dir.create(pkgdir) } else { stop("directory ", pkgdir, " exists. ", "Use replace=TRUE to replace it.") } }
@hpages. You are absolutely right. This is a better way to do it.
But I am still missing something. This is my code
setGeneric("forgeBSgenomeDataPkg", signature="x",
function(x, seqs_srcdir=".", destdir=".", replace=FALSE, verbose=TRUE)
standardGeneric("forgeBSgenomeDataPkg")
)
setMethod("forgeBSgenomeDataPkg", "BSgenomeDataPkgSeed",
function(x, seqs_srcdir=".", destdir=".", replace=FALSE, verbose=TRUE)
{
if (!isTRUEorFALSE(replace))
stop("'replace' must be TRUE or FALSE")
## The Biobase package is needed for createPackage().
if (!requireNamespace("Biobase", quietly=TRUE))
stop("Couldn't load the Biobase package. Please install ",
"the Biobase\n package and try again.")
template_path <- system.file("pkgtemplates", "BSgenome_datapkg",
package="BSgenome")
BSgenome_version <- installed.packages()['BSgenome','Version']
if (is.na(x@genome)) {
if (is.na(x@provider_version))
stop("'genome' field is missing in seed file")
warning("field 'provider_version' is deprecated ",
"in favor of 'genome'")
x@genome <- x@provider_version
}
if (!is.na(x@release_name))
warning("field 'release_name' is deprecated")
seqnames <- x@seqnames
if (!is.na(seqnames)) {
.seqnames <- eval(parse(text=seqnames))
} else {
if (is.na(x@seqfile_name)) {
.seqnames <- seqlevels(Seqinfo(genome=x@genome))
} else {
.seqnames <- NULL
}
}
seqnames <- deparse1(.seqnames)
circ_seqs <- x@circ_seqs
if (!is.na(circ_seqs)) {
.circ_seqs <- eval(parse(text=circ_seqs))
} else {
si <- Seqinfo(genome=x@genome)
.circ_seqs <- seqlevels(si)[isCircular(si)]
}
circ_seqs <- deparse1(.circ_seqs)
symvals <- list(
PKGTITLE=x@Title,
PKGDESCRIPTION=x@Description,
PKGVERSION=x@Version,
AUTHOR=x@Author,
MAINTAINER=x@Maintainer,
BSGENOMEVERSION=BSgenome_version,
SUGGESTS=x@Suggests,
LICENSE=x@License,
ORGANISM=x@organism,
COMMONNAME=x@common_name,
GENOME=x@genome,
PROVIDER=x@provider,
RELEASEDATE=x@release_date,
SOURCEURL=x@source_url,
ORGANISMBIOCVIEW=x@organism_biocview,
BSGENOMEOBJNAME=x@BSgenomeObjname,
SEQNAMES=seqnames,
CIRCSEQS=circ_seqs,
MSEQNAMES=x@mseqnames,
PKGDETAILS=x@PkgDetails,
SRCDATAFILES=x@SrcDataFiles,
PKGEXAMPLES=x@PkgExamples
)
## Should never happen
if (any(duplicated(names(symvals)))) {
str(symvals)
stop("'symvals' contains duplicated symbols")
}
## All symvals should by single strings (non-NA)
is_OK <- sapply(symvals, isSingleString)
if (!all(is_OK)) {
bad_syms <- paste(names(is_OK)[!is_OK], collapse=", ")
stop("values for symbols ", bad_syms, " are not single strings")
}
pkgdir <- file.path(destdir, x@Package)
if (file.exists(pkgdir)) {
if (replace) {
unlink(pkgdir, recursive=TRUE)
dir.create(pkgdir)
} else {
stop("directory ", pkgdir, " exists. ",
"Use replace=TRUE to replace it.")
}
}
Biobase::createPackage(x@Package, destdir, template_path, symvals)
I still get this error
>forgeBSgenomeDataPkg("BSgenome.Cfamiliaris.UCSC.canFam6-seed")
Error in forgeBSgenomeDataPkg(y, seqs_srcdir = seqs_srcdir, destdir = destdir, :
directory ./BSgenome.Cfamiliaris.UCSC.canFam6 exists. Use replace=TRUE to replace it.
> forgeBSgenomeDataPkg("BSgenome.Cfamiliaris.UCSC.canFam6-seed",replace=TRUE)
Error in forgeBSgenomeDataPkg(y, seqs_srcdir = seqs_srcdir, destdir = destdir, :
directory ./BSgenome.Cfamiliaris.UCSC.canFam6 exists. Use replace=TRUE to replace it.
So calling the function without replace
works as expected but It doesn't give the expected result if the replace
argument is added.
Why are you creating the pkgdir
folder? The Biobase::createPackage()
function takes care of creating that folder.
BTW I'm surprised that the second time you called forgeBSgenomeDataPkg()
above (i.e. with replace=TRUE
) you got that error. It seems that you should have gotten the old misleading error message instead i.e. the one coming from Biobase::createPackage()
, which doesn't say exactly the same thing as what you're showing above.
I removed dir.create(pkgdir)
built, and installed it again and I still get the same error. Let me create a PR so you see the entire code.
BTW I'm surprised that the second time you called forgeBSgenomeDataPkg() above (i.e. with replace=TRUE) you got that error. It seems that you should have gotten the old misleading error message instead i.e. the one coming from Biobase::createPackage(), which doesn't say exactly the same thing as what you're showing above.
I know right? 🤷🏻♀️
When you call forgeBSgenomeDataPkg("BSgenome.Cfamiliaris.UCSC.canFam6-seed")
or forgeBSgenomeDataPkg("BSgenome.Cfamiliaris.UCSC.canFam6-seed", replace=TRUE)
, you're calling the function on an object of type character, so the method for character objects will be used. Note that this method itself will call forgeBSgenomeDataPkg()
again but this time on a list object (y
). The problem is that this call doesn't pass down the replace
argument, so the forgeBSgenomeDataPkg()
method for list objects will be called with the default value for replace
, which is FALSE
.
When you call forgeBSgenomeDataPkg("BSgenome.Cfamiliaris.UCSC.canFam6-seed") or forgeBSgenomeDataPkg("BSgenome.Cfamiliaris.UCSC.canFam6-seed", replace=TRUE), you're calling the function on an object of type character, so the method for character objects will be used. Note that this method itself will call forgeBSgenomeDataPkg() again but this time on a list object (y). The problem is that this call doesn't pass down the replace argument, so the forgeBSgenomeDataPkg() method for list object will be called with the default value for replace, which is FALSE.
Ohhh. I think I just saw what you mean. Thank you for pointing that out. It's working fine now.
Also, is my documentation for the replace
satisfactory? Do I need to provide a more detailed explanation about it?
Yes, it's satisfactory. What this argument does is very simple so explanation should be short and straight to the point. See #53 for more comments. Thanks!
Okay. Thank you @hpages 😊
Should I take a second look at the PR?
In the meantime please take a look at my long comment here where I try to explain how the code in the fasta_to_sorted_2bit_for_Felis_catus_9.0.R
script works. The code in your fasta_to_sorted_2bit_for_Dog10K_Boxer_Tasha.R
script works the same way so my explanations apply to your script too.
It's a long due explanation, and it comes a little bit late for you now that we've closed issue #39, but it's never too late to take a close look at how the code in that script works. It discusses the notion of parallel objects which is central to a lot of the R code that we write in Bioconductor.
Be sure to let me know if you have questions.
It's a long due explanation, and it comes a little bit late for you now that we've closed issue https://github.com/Bioconductor/BSgenome/issues/39, but it's never too late to take a close look at how the code in that script works. It discusses the notion of parallel objects which is central to a lot of the R code that we write in Bioconductor.
I will certainly go through it and come back to you with questions if anything is unclear.
Ok so no question on how the code in fasta_to_sorted_2bit_for_Dog10K_Boxer_Tasha.R
works? Let's close this then.
Don't forget to record your contribution on Outreachy, thanks!
Ok so no question on how the code in fasta_to_sorted_2bit_for_Dog10K_Boxer_Tasha.R works? Let's close this then.
Not I don't. Your explanation was very clear.
Don't forget to record your contribution on Outreachy, thanks!
Already did. Thank you!
When trying to forge a new BSgenome data package with
forgeBSgenomeDataPkg()
, the user might sometimes perform several attempts before a successfulforgeBSgenomeDataPkg()
run (typically with some adjustments to their seed file between each attempt). However, before tryingforgeBSgenomeDataPkg()
again, the user is expected to manually remove the package folder created by the previous attempt. Otherwise, they'll get an error like this:Note that the error message is misleading and confusing: it suggests that the user can fix the problem by using
unlink=TRUE
but trying to do so does not work:That's because the
forgeBSgenomeDataPkg()
does not have this argument:Proposal
We propose to add the
replace
argument to theforgeBSgenomeDataPkg()
function.replace
should be set toFALSE
by default.If the directory to create already exists, and
replace
is set to TRUE, the function should replace it, that is, remove it and create it again.If the directory to create already exists, and
replace
is set to FALSE (the default), the function should fail with an informative error message. For example, the error message could say something like this:The new argument should be documented in
BSgenome/man/BSgenomeForge.Rd
.