Closed lsilvest closed 7 years ago
There is a small beginning of Ops()
in the R/
directory. Contributions welcome.
> library(nanotime)
R> nanotime("2017-02-02T17:16:27.633531000+00:00") + 1e9
[1] "2017-02-02T17:16:28.633531000+00:00"
R>
Agree on both points, but someone has to fill it in...
I don't know how to solve the "Incompatible methods" issue. If both integer64
and nanotime
define +
then the dispatch will conflict. This is from ?groupGeneric
:
"If a method is found for just one argument or the same method is found for both, it is used. If different methods are found, there is a warning about ‘incompatible methods’: in that case or if no method is found for either argument the internal method is used."
and applies whether the ops are group- or individually-defined. So it seems to be a general shortcoming of S3 dispatch. The same happens for POSIXct
for example:
Sys.time() + as.integer64(1)
[1] "2017-02-03 10:17:05 EST"
Warning message:
Incompatible methods ("+.POSIXt", "+.integer64") for "+"
Do you know if there's a workaround for the dispatch?
Uggh. I learned one nice trick from bit64::integer64 -- you can temporarily remove a class attribute. So we could make the c(nanotime, integer64)
object just an integer64, and then dispatch on the Ops.
Also, xts has some Ops code to look at left/right operand which is what we need to emulate difftime, ie difference between two nanotime objects becomes integer64.
(In other news, I finally built the R package for ztsdb too and managed to connect to an instance but haven't done anything. Been busy...)
Yes, but the first issue is to even be able to enter a given Ops function. When there is conflict the internal operator is called, bypassing both defined operators... It works well when interacting with built-in types, but it seems there's a fundamental limitation with the way S3 classes can interact together or am I missing something?
I fear you may be correct. But it is an area I do know not that much about.
@joshuaulrich Any idea?
I've had some success defining nanotime
as an S4 class that contains the S3 class integer64
. Wrapped in setMethod
, there is little change to the original code. More changes of course with the operators (and some more needed still). There are now more xts
functions working with nanotime
. Here are the changes: 68b4f5e6a82f5d707480facc34dc6d01bad86370. Also, didn't get round to doing the as.data.table
function, etc. - really just a proof of concept for now.
Uh-oh. I always found S4 to be somewhat more heavy-handed. So I am a little reluctant.
But if we manage to replicate all current features and then get better ops I could be persuaded.
OK, I'll try to complete the ops and will look in depth at compatibility with zoo
, data.frame
and additionally xts
. Is there a unit test package you think we should use? I've used RUnit
in the past and heard about testthat
and testit
. Do you have a suggestion there?
I think we should try to work out Ops first. Working with xts
, zoo
is next. data.table
, which is wicked good (but a little "different") already works.
As for tests, I happened to be a RUnit user too. "Old-school".
I've got the ops working, plus a decent utest coverage here: 9d415a6dccd80687024f16386512fdc6132a17a2
POSIXct
too, btw)R CMD check
; I've left the previous tests so you can check I didn't forget anythingWow wow wow. About to leave for work so not sure when I can check but that is very promising.
But it is S4 now? Hm.....
Yes, but do note that it's S4 derived from S3, so it conveniently falls back on integer64
ops.
I don't believe S3 can provide for correct ops because of the dispatch issue I mentioned previously.
I glanced at it on the train in. There are a few things we may want to clean up -- eg I do not like roxygen2 to touch my DESCRIPTION or NAMESPACE so I usually do not use @import
, @importFrom
, @export
. But those are all small fry.
Let me clean up what we have over in S3 land. Maybe even make one more release. And then be convert. I think you are correct that we have to go this route to 'tighten' the operations. However, a few of us have some (mostly personal though) "bad taste" in our mouths from some older-and-now-mostly-dead time series classes in S4: its, dateTime, ... so we may want to do this carefully.
Agreed, I'll take out the @export
. With S4 and the increase in number of exports, it was easier to get them correct with roxygen2.
I was looking at performance numbers, and the S4 version is about 2x slower than the S3 version when the result is a nanotime
. Looks like the constructor makes a copy even where it could potentially avoid one. I briefly looked at C code for S4 generation and it didn't seem immediately obvious if this could be easily alleviated or not.
I also caught a really strange performance issue with c
which had an easy workaround (fixed in a58fc22b9ca12e677fb5ffd65813e8dadcc95820)
The immediate conclusion though is that in its current state the implementation with S4 is about 2x slower than with S3. That seems to be the trade-off for having correct nanotime
operations.
Yes, S4 has a reputation for being a little slower. It is also said to be more solid.
And I guess while we're mucking with the different classes, the @export
tag may make sense. In general I still avoid them...
I did get random roxygen errors trying to build of your branch.
I've been having issues too. For some reason I can't run document()
twice in the same R session.
Furthermore, I have an issue in the part where it generates keywords. I assumed that it was my Roxygen version 6.0.0.9000 (the latest on github) and so I ended up modifying Roxygen's code. If my memory is correct I was having that issue with your unmodified package too, that's why I assumed it was some issue introduced between 5.0.1 and 6.0.0.9000. I can't check this right now but will do later today when I get back to my computer.
Just say no to GitHub. Let's use roxygen 6.0.1 from CRAN.
I installed 6.0.1, looked at the error I was getting a little closer, and it turns out I had an old version of stringr
. I still can only generate documentation once in a session, but this works for me:
library(devtools)
document("~/prog/nanotime")
I prefer to do these things at the command-line. Maybe look into littler, link its examples/
script to /usr/local/bin
and then call roxy.r
?
(Which by my preferences only does Rd and no other roclets.)
roxy.r
makes a direct call to roxygenize
. This produces the error "unable to find required package ‘roxygen_devtest"... The error occurs when defining nanotime
as an S4, specifically inside the setClass
function when the code is loaded via roxygen's function source_package
inside roxygenize
. It's looks like an issue in roxygen2. document
works because it calls roxygenize
with its own source function. There is this note in roxygen2::source_package
:
#' This is a simple attempt to load code in a package used by
#' [roxygenize]. It will work with simple packages, but fail if
#' there are compiled files, data files, etc. In that case, it's better to
#' use [devtools::document].
It looks like document
uses a more sophisticated way of loading the required package using in particular load_all
. That seems to also work for the setClass
...
Do you think there is ground to raise an issue with roxygen2
? My feeling is we'll be told to use document
...
By the way, it seems it's also possible to specify the roclets with document
, so we can achieve the same as with the direct call to roxygenize
...
I don't know / I don't care in the sense that I do not use devtools
, but if I must some of the component packages split off it (ie remotes::install_github
).
I am sure you can find a way to let us experiment with s4 as an alternative representation.
I'm not sure what error you are encountering.
I have no issues installing the package. Either via R CMD build nanotime
and then R CMD INSTALL ...
or via remotes::install_github
. Now that my R packages are up-to-date I'm also not seeing any errors while generating documentation with document
(but I do have the problem mentioned above when using roxygenize
directly).
I'm not sure what error you are encountering.
Basically R CMD check
. I happen to like command-line helpers and tools, so rcc.r
is just shorthard for calling Gabor's rcmdcheck() function, which, in essence, just prettyprints.
You are getting three NOTES and WARNING. I prefer to keep my packages ready for CRAN with minimal warning. I would not yet commit this to my main branch.
edd@max:~/git/leonardo-nanotime$ rcc.r
────────────────────────────────────────────────────────────────────────────────
─ using log directory ‘/tmp/file420a669023e9/nanotime.Rcheck’
─ using R version 3.3.2 (2016-10-31)
─ using platform: x86_64-pc-linux-gnu (64-bit)
─ using session charset: UTF-8
✔ checking for file ‘nanotime/DESCRIPTION’
─ checking extension type ... Package
─ this is package ‘nanotime’ version ‘0.1.1’
✔ checking package namespace information
✔ checking package dependencies
✔ checking if this is a source package
✔ checking if there is a namespace
✔ checking for executable files
✔ checking for hidden files and directories
✔ checking for portable file names
✔ checking for sufficient/correct file permissions
✔ checking whether package ‘nanotime’ can be installed
✔ checking installed package size
✔ checking package directory
✔ checking DESCRIPTION meta-information
✔ checking top-level files
✔ checking for left-over files
✔ checking index information
✔ checking package subdirectories
✔ checking R files for non-ASCII characters
✔ checking R files for syntax errors
✔ checking whether the package can be loaded
✔ checking whether the package can be loaded with stated dependencies
✔ checking whether the package can be unloaded cleanly
N checking whether the namespace can be loaded with stated dependencies
Warning: no function found corresponding to methods exports from ‘nanotime’ for: ‘Arith’, ‘Compare’, ‘Complex’, ‘Logic’, ‘Math’, ‘Math2’, ‘Summary’, ‘show’
A namespace must be able to be loaded with just the base namespace
loaded: otherwise if the namespace gets loaded by a saved object, the
session will be unable to start.
Probably some imports need to be declared in the NAMESPACE file.
✔ checking whether the namespace can be unloaded cleanly
✔ checking loading without being on the library search path
N checking dependencies in R code
package 'methods' is used but not declared
✔ checking S3 generic/method consistency
✔ checking replacement functions
✔ checking foreign function calls
N checking R code for possible problems
as.data.frame.nanotime: no visible global function definition for
‘S3Part’
as.integer64.nanotime: no visible global function definition for
‘S3Part’
nanotime.matrix: no visible global function definition for ‘new’
+,integer64-nanotime: no visible global function definition for ‘new’
+,integer64-nanotime: no visible global function definition for
‘S3Part’
+,nanotime-integer64: no visible global function definition for ‘new’
+,nanotime-integer64: no visible global function definition for
‘S3Part’
+,nanotime-numeric: no visible global function definition for ‘new’
+,nanotime-numeric: no visible global function definition for ‘S3Part’
+,numeric-nanotime: no visible global function definition for ‘new’
+,numeric-nanotime: no visible global function definition for ‘S3Part’
-,nanotime-integer64: no visible global function definition for ‘new’
-,nanotime-integer64: no visible global function definition for
‘S3Part’
-,nanotime-nanotime: no visible global function definition for ‘S3Part’
-,nanotime-numeric: no visible global function definition for ‘new’
-,nanotime-numeric: no visible global function definition for ‘S3Part’
Arith,nanotime-ANY: no visible global function definition for
‘callNextMethod’
Arith,nanotime-ANY: no visible global function definition for ‘S3Part’
Compare,nanotime-ANY: no visible global function definition for
‘callNextMethod’
Compare,nanotime-ANY: no visible global function definition for
‘S3Part’
[,nanotime: no visible global function definition for ‘new’
[,nanotime: no visible global function definition for ‘callNextMethod’
[<-,nanotime: no visible global function definition for ‘new’
[<-,nanotime: no visible global function definition for
‘callNextMethod’
c,nanotime: no visible global function definition for ‘new’
c,nanotime: no visible global function definition for ‘callNextMethod’
c,nanotime: no visible global function definition for ‘S3Part’
cbind,nanotime: no visible global function definition for ‘new’
cbind,nanotime: no visible global function definition for ‘S3Part’
max,nanotime: no visible global function definition for ‘new’
max,nanotime: no visible global function definition for
‘callNextMethod’
min,nanotime: no visible global function definition for ‘new’
min,nanotime: no visible global function definition for
‘callNextMethod’
nanotime,POSIXct: no visible global function definition for ‘new’
nanotime,character: no visible global function definition for ‘new’
range,nanotime: no visible global function definition for ‘new’
range,nanotime: no visible global function definition for
‘callNextMethod’
rbind,nanotime: no visible global function definition for ‘new’
rbind,nanotime: no visible global function definition for ‘S3Part’
Undefined global functions or variables:
S3Part callNextMethod new
Consider adding
importFrom("methods", "S3Part", "callNextMethod", "new")
to your NAMESPACE file (and ensure that your DESCRIPTION Imports field
contains 'methods').
✔ checking Rd files
✔ checking Rd metadata
✔ checking Rd cross-references
W checking for missing documentation entries
Undocumented code objects:
‘cbind’ ‘rbind’
Undocumented S4 classes:
‘nanotime’
Undocumented S4 methods:
generic '+' and siglist 'ANY,nanotime'
generic '+' and siglist 'integer64,nanotime'
generic '+' and siglist 'nanotime,ANY'
generic '+' and siglist 'nanotime,integer64'
generic '+' and siglist 'nanotime,nanotime'
generic '+' and siglist 'nanotime,numeric'
generic '+' and siglist 'numeric,nanotime'
generic '-' and siglist 'ANY,nanotime'
generic '-' and siglist 'nanotime,character'
generic '-' and siglist 'nanotime,integer64'
generic '-' and siglist 'nanotime,nanotime'
generic '-' and siglist 'nanotime,numeric'
generic 'Arith' and siglist 'nanotime,ANY'
generic 'Compare' and siglist 'nanotime,ANY'
generic 'Complex' and siglist 'nanotime'
generic 'Logic' and siglist 'ANY,nanotime'
generic 'Logic' and siglist 'nanotime,ANY'
generic 'Math2' and siglist 'nanotime'
generic 'Math' and siglist 'nanotime'
generic 'Summary' and siglist 'nanotime'
generic '[' and siglist 'nanotime'
generic '[<-' and siglist 'nanotime'
generic 'c' and siglist 'nanotime'
generic 'cbind' and siglist 'nanotime'
generic 'max' and siglist 'nanotime'
generic 'min' and siglist 'nanotime'
generic 'range' and siglist 'nanotime'
generic 'rbind' and siglist 'nanotime'
generic 'show' and siglist 'nanotime'
All user-level objects in a package (including S4 classes and methods)
should have documentation entries.
See chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual.
✔ checking for code/documentation mismatches
W checking Rd \usage sections
Documented arguments not in \usage in documentation object 'nanotime':
‘justify’ ‘digits’ ‘na.encode’ ‘trim’ ‘e1’ ‘e2’
Functions with \usage entries need to have the appropriate \alias
entries, and all their arguments documented.
The \usage entries must correspond to syntactically valid R code.
See chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual.
✔ checking Rd contents
✔ checking for unstated dependencies in examples
✔ checking examples
W checking for unstated dependencies in ‘tests’
'library' or 'require' call not declared from: ‘xts’
✔ checking tests
✔ checking PDF version of manual
── 0 errors ✔ | 3 warnings ✖ | 3 notes ✖
edd@max:~/git/leonardo-nanotime$
OK, I see. Will try to get rid of as many of these as possible.
The last one should just be copy-cat'ing; I had to do the same for the format()
and print()
methods. Some of the other stuff is probably easily learnable from other S4 packages.
No warnings and notes in eadff32dd3643636c81677e6c65f5adf31db8e1f. I don't think it's satisfactory because the methods are defined to produce an error end up in the documentation (e.g. methods for group Complex
). We could put a warning that they generate an error, or pull them out and live with a warning during R CMD check. I wasn't able to suppress the warning with @noRd
. You probably have a better feel than me for all this. As soon as I get a minute I'll pull the changes you made - I'm a few commits behind.
Ok, will catch up later -- trying to go for a run now having spent the morning on two other repos.
Rebasing is probably a good idea if you can. I also prefer the trees to not be too far apart.
Couldn't resist a really quick peek -- any idea?
✔ checking for unstated dependencies in examples
✔ checking examples
✔ checking for unstated dependencies in ‘tests’
E checking tests
Running the tests in ‘tests/runTests.R’ failed.
Last 13 lines of output:
done successfully.
Executing test function test_rbind.xts ... Timing stopped at: 0.004 0 0.002
Error in xts(matrix(1:30, 10, 3), i1) :
order.by requires an appropriate time-based object
done successfully.
Error in checkForErrors(results) :
error or failure in test_xts.R : list(nTestFunc = 8, nDeactivated = 0, nErr = 8, nFail = 0, dirs = "../nanotime/unitTests", testFileRegexp = "^test_xts.R$", testFuncRegexp = "^test.+", sourceFileResults = list(`../nanotime/unitTests/test_xts.R` = list(test_align.time = list(kind = "error", msg = "Error in xts(a1, i1) : order.by requires an appropriate time-based object\n", checkNum = 0, traceBack = NULL), test_c.xts = list(kind = "error", msg = "Error in xts(matrix(1:30, 10, 3), i1) : \n order.by requires an appropriate time-based object\n",
checkNum = 0, traceBack = NULL), test_cbind.xts = list(kind = "error", msg = "Error in xts(a1, i1) : order.by requires an appropriate time-based object\n", checkNum = 0, traceBack = NULL), test_coredata.xts = list(kind = "error", msg = "Error in xts(a1, i1) : order.by requires an appropriate time-based object\n", checkNum = 0, traceBack = NULL), test_dimnames.xts = list(kind = "error", msg = "Error in xts(a1, i1) : order.by require
Execution halted
✔ checking PDF version of manual
── 1 error ✖ | 0 warnings ✔ | 0 notes ✔
edd@max:~/git/leonardo-nanotime(master)$
Ouch... I'm running with my version of xts
which adds nanotime
as a time-based object. I'll disable these tests for now until xts
integrates nanotime
.
Tried my best to rebase, but I had already previous merges and my two attempts ended up loosing my commit history (although the code was correct). I ended up doing a merge because it's safer in this case. Note that I reverted in the process my changes for automatic namespace generation, so it's back to manual as in your branch.
Appreciate the man-to-man combat with git.
Yes, that was definitely scary :)
I poked you over on slack. Wanna chat there?
For the rbind and cbind issues, I've done some testing and looked at the source code. It doesn't look like one can do any better.
Just for the record, it's not possible to use cbind2 and rbind2 because there exist rbind.integer64 and cbind.integer64. As documented, if an S3 method can be found, there will never be a dispatch onto methods:rbind. Here is an example to show that:
setOldClass("foo")
as.foo <- function(x) {
oldClass(x) <- "foo"
x
}
setClass("bar", contains="foo")
setMethod("rbind2",
signature("bar", "bar"),
function (x, y, ...) {
print("rbind2 called!")
new("bar", as.foo(c(x,y)))
})
a <- new("bar", as.foo(matrix(1,1,1)))
## as soon as the following is defined, 'rbind2' is not called anymore:
rbind.foo <- function(..., deparse.level = 1) {
print("rbind.foo called")
}
At any rate, I think rbind and cbind are not needed. POSIXct doesn't have them and works in the same way as nanotime would work without rbind and cbind definitions:
cbind(as.POSIXct("1970-01-01 00:00:00") + 1:10)
[,1]
[1,] 18001
[2,] 18002
[3,] 18003
[4,] 18004
[5,] 18005
[6,] 18006
[7,] 18007
[8,] 18008
[9,] 18009
[10,] 18010
What do you think?
For the names issue with c it's possible to define c.nanotime as an S3 method and the naming issue would normally be resolved except for what could be considered a bug in as.integer64 which loses the names of a named vector:
c(a=as.integer64(1), b=as.integer64(2))
## integer64
## a b
## 1 2
as.integer64(c(a=1, b=2)) ## loses the names
## integer64
## [1] 1 2
If you agree, I'll raise the question with bit64, but I don't think it's a show stopped for nanotime.
re cbind
/ rbind
: I think we should proceed without them. No name clashing, and as you say, the functionality may not be needed
re c
: I am confused. I thought the c()
operator always dropped in R, irregardless of class or type?
Yes, it's true it always drops, but I'm just considering if c keeps the names or not. If one builds a named vector one should not expect the names to disappear like in as.integer64. Here is an example with POSIXct:
> as.POSIXct(c(a="1970-01-01 00:00:00", b="1970-01-01 00:00:00"))
## a b
## "1970-01-01 EST" "1970-01-01 EST"
> c(a=as.POSIXct("1970-01-01 00:00:00"), b=as.POSIXct("1970-01-01 00:00:00"))
## a b
## "1970-01-01 EST" "1970-01-01 EST"
In the previous example with as.integer64, the names a and b are dropped.
[ imagine a shrugs emoji here ]
Do we care about names on a time index vector?
It's not a show stopper, but proper implementation of a new type should guarantee the usual behaviour. Else it's an idiosyncrasy for these specific types and personally I would call that a bug because it's a behaviour that will surprise and bite a user.
It definitely bit me when I was writing the utests!
And a couple of additional precisions: I think that nanotime will not necessarily be used only as a time index, and second, if as.integer64 is fixed, nanotime will also automatically be fixed without needing to change the code.
I think this can be closed as we folded the change in.
Any unpleasant consequences of this change so far? It looks like a pretty heavy handed fix for the size of the original problem.
dplyr + frends are norious for bad support of generic S4 vectors. Currently I see
> tibble(x = 1:10, y = nanotime(Sys.time()))
Error in callNextMethod() :
error in evaluating a 'primitive' next method: Error in FUN(x = x, i = i) : could not find function "FUN"
> tibble(x = 1:10, y = integer64(1))
# A tibble: 10 x 2
x y
<int> <S3: integer64>
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
7 7 0
8 8 0
9 9 0
10 10 0
>
An alternative fix to the original problem might have been to overwrite integer64
arithmetic ops. There are a number of options to do so.
More importantly (and please correct me if I am wrong), it seems that the only reason for +/- methods in nanotime is to preserve the class. In which case, bit64
methods do that already:
> tt <- integer64(1)
> class(tt) <- c("blabla", "integer64")
> class(tt + integer64(1))
[1] "blabla" "integer64"
So, it looks that neither this change nor the original S3 methods are actually necessary.
I think @lsilvest are not aware of side effects. We just merged a small change needed for a bit64 update. Our use cases of nanotime with data.table and Leonardo's zts work fine.
> library(nanotime)
>
> data.frame(x=1:10, y=nanotime(Sys.time()))
Error in data.frame(x = 1:10, y = nanotime(Sys.time())) :
arguments imply differing number of rows: 10, 1
>
> data.frame(x=1:10, y=nanotime(Sys.time())+0:9)
x y
1 1 2018-04-19T15:15:45.597214000+00:00
2 2 2018-04-19T15:15:45.597214001+00:00
3 3 2018-04-19T15:15:45.597214002+00:00
4 4 2018-04-19T15:15:45.597214003+00:00
5 5 2018-04-19T15:15:45.597214004+00:00
6 6 2018-04-19T15:15:45.597214005+00:00
7 7 2018-04-19T15:15:45.597214006+00:00
8 8 2018-04-19T15:15:45.597214007+00:00
9 9 2018-04-19T15:15:45.597214008+00:00
10 10 2018-04-19T15:15:45.597214009+00:00
>
We worked pretty hard on the S3 and S4 methods, and still see the current ones as correct. I don't recall the excact details but my initial hopes also were to inherit as much as possible from bit64 and its integer64 class -- but some date/time semantics got it in the way, and here we are.
Now, the data.frame
throws an error if I try it with a scalar nanotime and a vector. That is base R, and for me closer to reference behaviour than what a tibble may do. @lsilvest, any thoughts on this?
With tibble the non-recycling version works but fails on printing:
> tt <- tibble(x=1:10, y=nanotime(Sys.time())+0:9)
> tt
Error in callNextMethod() :
error in evaluating a 'primitive' next method: Error in FUN(x = x, i = i) : could not find function "FUN"
Most likely a tibble issue, but well ...
Really interesting stuff: thanks!
If we just rely on integer64
and don't define any operators, we get this:
> nanotime(1) + nanotime(2)
integer64
[1] 3
attr(,".S3Class")
[1] integer64
attr(,"class")
[1] nanotime
This is because despite the attributes being right, attributes are not enough to make something an S4 object.
So that's the first rationale for defining the operators with nanotime
. A second rationale is that it's undesirable to have unintended automatic typecasts away from nanotime
, so with S4 it's possible to be tighter and not allow (unwanted) meaningless operations. This is especially important with the forthcoming additional types (duration
, interval
, period
) that are in my nanotime
fork in the temporal_types
branch.
I'm finding more and more that using an S4 object containing an S3 is tricky. We've got two systems interacting in subtle ways. I also had a nasty surprise with seemingly small method dispatch changes breaking some of the nanotime
code between R-3.3.1 and R-3.4.4.
I'm going to go through very carefully the issues above and also carefully read the R code for method dispatch.
@vspinu, I'm really not a fan of overriding. Seems kludgy to me and have been bitten on occasion when doing that. Maybe it's possible to do that safely, but I'm not convinced.
It might be easier in the end to not use integer64
as the base for nanotime
while still defining conversion functions.
We can always burn it all down and rewrite in R6 :) Until then it mostly works and gives us some integer64-based time ops.
Another point that it blocking in order for
xts
to work withnanotime
is arithmetic. In particular:The above works with
double
but not withinteger64
. I don't know how to solve this...The fact that nanotime subtraction gives back a nanotime might be also problematic:
Did you have a specific idea in mind for arithmetic?