Closed ben519 closed 3 years ago
Thanks. For now the you can use
nafill(nafill(x = c(NA,1,NA,NA,5,3,NA,0), type = "locf"), fill = -1)
@jangorecki if no one else is working on this, can I take it up ?
@saraswatmks I just assigned you, go for it!
@saraswatmks note that nafill tests are in inst/tests/nafill.Rraw
, so new tests should goes there. It can be useful to avoid merge conflicts and to easily test only this script test.data.table(script="inst/tests/nafill.Rraw")
@MichaelChirico thanks! I see this function is still in dev. How do I reproduce it on my local machine ? My local master is up to date with remote. If I do Clean and Rebuild
, I get this error (I am kind of stuck on this):
==> R CMD INSTALL --preclean --no-multiarch --with-keep.source data.table
* installing to library ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library’
* installing *source* package ‘data.table’ ...
** libs
/usr/local/opt/llvm/bin/clang -fopenmp -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I/usr/local/opt/gettext/include -I/usr/local/opt/llvm/include -fopenmp -fPIC -g -O3 -Wall -pedantic -std=gnu99 -mtune=native -pipe -c assign.c -o assign.o
In file included from assign.c:1:
In file included from ./data.table.h:1:
/Library/Frameworks/R.framework/Resources/include/R.h:55:11: fatal error: 'stdlib.h' file not found
# include <stdlib.h> /* Not used by R itself, but widely assumed in packages */
^~~~~~~~~~
1 error generated.
make: *** [assign.o] Error 1
ERROR: compilation failed for package ‘data.table’
* removing ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/data.table’
* restoring previous ‘/Library/Frameworks/R.framework/Versions/3.5/Resources/library/data.table’
Exited with status 1.
@saraswatmks I found that when working with RStudio features like Clean and Rebuild it was actually resulting into more time wasted into debugging issues than it helped. Although for package where no compile code was present it was much more reliable. Anyway, I suggest to use cc()
which is much faster, and AFAIR never introduced issues that would waste my time for debugging.
more info in https://github.com/Rdatatable/data.table/tree/master/.dev
@saraswatmks regarding your installation error:
/Library/Frameworks/R.framework/Resources/include/R.h:55:11: fatal error: 'stdlib.h' file not found
# include <stdlib.h> /* Not used by R itself, but widely assumed in packages */
I just came across the same issue on my new laptop. It appears something in the installation of Developer Tools was mixed up. Found this comment on another repo:
https://github.com/catboost/catboost/issues/137#issuecomment-424595790
And it worked on my machine. Hope it can help.
related issue https://github.com/Rdatatable/data.table/issues/3700
Just happened on a use case for this playing around with COVID data 😃
library(data.table)
URL = file.path(
'https://raw.githubusercontent.com',
'nytimes/covid-19-data/master/us-counties.csv'
)
covid = fread(URL, colClasses = c(date = 'IDate'), key = 'state,county,date')
covid[state == 'Pennsylvania', dcast(.SD, date ~ county, value.var = 'cases')][ , 1:5]
# date Adams Allegheny Armstrong Beaver
# 1: 2020-03-06 NA NA NA NA
# 2: 2020-03-07 NA NA NA NA
# 3: 2020-03-08 NA NA NA NA
# 4: 2020-03-09 NA NA NA NA
# 5: 2020-03-10 NA NA NA NA
# 6: 2020-03-11 NA NA NA NA
# 7: 2020-03-12 NA NA NA NA
# 8: 2020-03-13 NA NA NA NA
# 9: 2020-03-14 NA 1 NA NA
# 10: 2020-03-15 NA 3 NA NA
# 11: 2020-03-16 NA 5 NA NA
# 12: 2020-03-17 NA 10 NA 1
# 13: 2020-03-18 1 12 NA 2
# 14: 2020-03-19 2 18 NA 2
# 15: 2020-03-20 5 28 NA 3
# 16: 2020-03-21 5 31 NA 3
# 17: 2020-03-22 5 40 NA 3
# 18: 2020-03-23 6 48 NA 3
# 19: 2020-03-24 6 58 1 3
# 20: 2020-03-25 6 88 1 7
# 21: 2020-03-26 7 133 1 13
# date Adams Allegheny Armstrong Beaver
I want to fill the initial missing values with 0
(which is correct), but would use LOCF
to carry-forward most recent data in the event it's missing (not observed here...)
lapply(.SD, nafill, type = 'locf', fill = 0)
seems natural enough.
AFAIR nafill
is vectorized, so no need to lapply .SD it. nafill(.SD)
should be enough, and also parallel.
Suppose I have this vector
and I want to fill NAs with the preceding non NA values. I can do this
Great, but sometimes I want to specify a fill value to catch the NA(s) at the front of the vector. I tried this which seemed obvious to me,
but it didn't work and instead gave a warning, "argument 'fill' ignored, only make sense for type='const'".
My request is to extend the method so that 'fill' is applied to the front/back of the vector for types 'locf' and 'nocb' respectively. Thanks
Output of
sessionInfo()