christophsax / tempdisagg

Methods for Temporal Disaggregation and Interpolation of Time Series
http://cran.r-project.org/web/packages/tempdisagg
37 stars 5 forks source link

bad error message n.fc >= 0 #15

Closed ronaldindergand closed 9 years ago

ronaldindergand commented 10 years ago

should return meaningful error or omit annual value

   td(agric_n ~ window(ppi_lw, end = c(2011, 1))
     Error: n.fc >= 0 is not TRUE 
bhuston commented 9 years ago

Chris,

I have two series (l.f., h.f) which are highly exactly the same length and am getting this

Error: n.fc >= 0 is not TRUE

issue. I have tried sample reduction (making my l.f. even shorter than h.f) and and everything still fails. Any other possible things that could be throwing this error?

Thanks, Ben

bhuston commented 9 years ago

I should correct myself. My h.f series is exactly an integer multiple of 30 longer than my l.f. I have also tired the case where the h.f. was greater than 30 multple (eg, 30.74) than l.f and still get same issue.

christophsax commented 9 years ago

Hi Ben, thanks for the report. Can you give me a reproducible example? That would be very helpful. Christoph

bhuston commented 9 years ago

Thanks for the follow up Chris.

I actually figured out what was causing the issue. I simply needed to explicitly use the as.vector() function within td(). I had previously used just unlist() on my input vectors but apparently this wasn't good enough.

As for the example, I am rushed to get something to a co-author of mine right now so I am cannot put together a 100% reproducible one with dummy data. However, I post below my code which gives the gist of how I got around the error.

subsetting from dataframes of monthly and daily time series

Low <- na.omit(CCAdata_GAM[, c("edfdate", "us_unemployment_rate")]) High <- na.omit(CCAdata_GAM[, c("edfdate", "us_3m_tbill_yield")])

A <- nrow(Low) * 30 # find adjustment factor to make high series exact multiple of short one B <- nrow(High) High <- rbind(High, High[ (B-(A-B)):(B-1),])
A <- nrow(Low)*30 B <- nrow(High) A-B # should equal zero

temp1 <- td( as.vector(Low[,2][1:100]) ~ as.vector(High[,2][1:3000]), to = 30, conversion="last") # using really long series causing memory issues

Thanks again! By the way, do you have any advice on memory management for td()? My plan was to disagg monthly/quarterly series into daily ones for 10 years of data. However, when I run td() using the exact code above -- without the vector subsetting within in td() -- I completely run out of memory at 64GB. (I am using a brand-new 10-core server for calculations. My number of daily obs is about 500,000.)

EDIT: regarding the memory issue, it turns out that I'm just a moron. I had merged daily/monthly macroecomic series with a time/country observation level panel dataset and was trying run td() on the entire span of the dataframe (500,000 obs in total). It is a much better idea to run td() on just the "un-panelized" macro time series (which will be far fewer obs) and then merge the dis-aggregated result with the final panel dataframe afterwards. Chris, your package works perfectly!

christophsax commented 9 years ago

Thanks, Ben

High[,2][1:3000] is probably something different than a vector. You could check it with class().

Anyway, I would recommend to convert the inputs to time series first, using ts(). This will save you some hassle, and makes the to argument redundant.

Cheers, Christoph

On 14 Feb 2015, at 19:45, Ben Huston notifications@github.com wrote:

Thanks for the follow up Chris.

I actually figured out what was causing the issue. I simply needed to explicitly use the as.vector() function within td(). I had previously used just unlist() on my input vectors but apparently this wasn't good enough.

As for the example, I am rushed to get something to a co-author of mine right now so I am cannot put together a 100% reproducible one with dummy data. However, I post below my code which gives the gist of how I got around the error.

subsetting from dataframes of monthly and daily time series

Low <- na.omit(CCAdata_GAM[, c("edfdate", "us_unemployment_rate")]) High <- na.omit(CCAdata_GAM[, c("edfdate", "us_3m_tbill_yield")])

A <- nrow(Low)30 # find adjustment factor to make high series exact multiple of short one B <- nrow(High) High <- rbind(High, High[ (B-(A-B)):(B-1),])

A <- nrow(Low)30 B <- nrow(High) A-B # should equal zero

temp1 <- td( as.vector(Low[,2][1:100]) ~ as.vector(High[,2][1:3000]), to = 30, conversion="last") # using really long series causing memory issues

Thanks again! By the way, do you have any advice on memory management for td()? My plan was to disagg monthly/quarterly series into daily ones for 10 years of data. However, when I run td() using the exact code above -- without the vector subsetting within in td() -- I completely run out of memory at 64GB. (I am using a brand-new 10-core server for calculations. My number of daily obs is about 500,000.)

— Reply to this email directly or view it on GitHub https://github.com/christophsax/tempdisagg/issues/15#issuecomment-74386858.

christophsax commented 9 years ago

i am closing this one. There is a new issue on input checks #20 which addresses the cryptic errror message in ben's problem.