fmicompbio / QuasR

This package provides a framework for the quantification and analysis of Short Reads. It covers a complete workflow starting from raw sequence reads, over creation of alignments and quality control plots, to the quantification of genomic regions of interest.
https://fmicompbio.github.io/QuasR/
5 stars 3 forks source link

Future proofing: on.exit() stacking may cause issues. #41

Open skiaphrene opened 1 year ago

skiaphrene commented 1 year ago

Hi there,

In the (QuasR 1.41.1) createAlignment functions, on.exit(expr, add=TRUE) is called multiple times, to record code that removes temporary files after the function finishes. "expr" typically contains R objects that contain filenames. However, there is a warning in the manual for on.exit():

The expr argument passed to on.exit is recorded without evaluation. If it is not subsequently removed/replaced by another on.exit call in the same function, it is evaluated in the evaluation frame of the function when it exits (including during standard error handling). Thus any functions or variables in the expression will be looked for in the function and its environment at the time of exit: to capture the current value in expr use substitute or similar.

This means that if the contents of the R objects changes between when it was logged first using on.exit() and when the function actually exits, then the code actually executed upon exit should not work on the original content. This behaviour can be seen with this code: test <- function( ) { for(i in 1:4) { on.exit(print(i), add=TRUE) } } ; test() # => all return 4

Now, I did not actually see any places in the createAlignment functions where this would be an issue (e.g. in a loop), so the point is perhaps moot, but I wanted to check with you what you thought, and whether you think "future proofing" the code could be desirable - e.g. having a single on.exit() delete a vector of file names, and then simply adding to that vector as we progress through the function, rather than having one on.exit(...,add=TRUE) for each temporary file.

Thanks for your time! Best regards,

-- Alex

mbstadler commented 1 year ago

I think this may currently not be an issue (as things are cleaned up correctly), but is indeed a possible trap that we might fall into. I am not sure if having a single on.exit(unlink(files_to_remove)) call and then just adding to files_to_remove would be the best solution, because this solution (as the one we have currently implemented) requires you to remember things that happen elsewhere in the code (like you need to know that there is an on.exit() call somewhere that acts on files_to_remember).

A possible fix that would not force you to remember all code is given in the quoted paragraph from the on.exit() documentation, based on substitute(). Would that be a better alternative?

skiaphrene commented 1 year ago

Yes I saw that and it looked attractive (which is why I included it in the quote above :-) ), but my own knowledge of R breaks down at substitute() - at the very least, I tried using this proposed solution yesterday, but could not get it to work. I am not sure the "remember that things happen elsewhere in the code" is too much of an issue, the functions are not that big, and assuming a developer is looking through the function, the "on.exit()" call would still be near the top of the function, so it should stay in the developer's mind.

mbstadler commented 1 year ago

What about: test <- function( ) { for(i in 1:4) { eval(substitute(on.exit(print(x), add=TRUE), list(x = i))) } } ; test()

which will give:

[1] 1
[1] 2
[1] 3
[1] 4
skiaphrene commented 1 year ago

Yep I was definitely not using substitute() correctly. It looks a bit unwieldy but this does seem like a simple and safer option to the on.exit() stacking.