antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
459 stars 72 forks source link

unable to handle NaN data #62

Closed mgiangreco closed 7 years ago

mgiangreco commented 7 years ago

I have data that looks like this, where for something like "3476867_4327" the first number (3476867) represents a product and the second number (4327) represents a category:

screenshot 2017-08-30 17 35 59

In other words, not every product has sales for every week--for some products during some weeks there is no order data available.

Attempting to run this through HierarchicalForecastEngine like so:

import pyaf.HierarchicalForecastEngine as hautof

lEngine = hautof.cHierarchicalForecastEngine()

lSignalVar = "sales_qty"; lDateColumn = "purchased_at";

lSignalHierarchy = lEngine.train(weekly_df , lDateColumn, lSignalVar, 1, lHierarchy, None);

results in this error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

How do you recommend dealing with this?

antoinecarme commented 7 years ago

NaN is not supported in signals.

Please replace missing data before trying to build a model.

I usually add a dataframe.fillnan(0.0) when the missing semantics is zero (when the signal is a count or an amount this is OK)