dhopp1 / nowcastLSTM

R wrapper for nowcast_lstm Python library. Long short-term memory neural networks for economic nowcasting.
12 stars 7 forks source link

Error in py_run_string_impl(code, local, convert) : AttributeError: 'DataFrame' object has no attribute 'NA' #7

Closed JRSlotman closed 2 years ago

JRSlotman commented 2 years ago

My nowcasting script keeps failing since yesterday, producing the following error when I call the LSTM function

Error in py_run_string_impl(code, local, convert) : AttributeError: 'DataFrame' object has no attribute 'NA'

I suspect it may have something to do with the reticulate package but I don't know how to fix it. I call Python and your package using the following two lines of code

use_python(python = miniconda_path(), required = T) initialize_session(python_path = miniconda_path())

Any idea what's going on here?

dhopp1 commented 2 years ago

Can you upload a small sample CSV of the dataframe that produces the issue? Even better that + a small code snippet that reproduces the error. That error in Python generally means that you are looking for a column name that doesn't exist in the dataframe, but I'd be able to diagnose better with the data causing the issue.

JRSlotman commented 2 years ago

I found out what caused the issue. I was trying to incorporate data from other sources using dplyr piping functions, which accidentally changed the variable type of the date column from the default date format to POSIXct (i.e. with time zone). Forcing the date column in the new dataframe using mutate(date = as.Date(datetime)) before joining did the trick.

dhopp1 commented 2 years ago

Awesome. Then I know what threw the error, the date column is determined by checking for a datetime type, if it's not found the date column name is registered as "NA", then later on when the date column is referred to "NA" isn't found in the dataframe. But I've noted to return an error message about no type of date found rather than this not very helpful message.

On a side note feel free to check out this repository, where I benchmark the LSTM vs. other popular nowcasting and ML techniques on US GDP. It's a work in progress, but I also have some info on a grid of hyperparameters to test in this file.

JRSlotman commented 2 years ago

That's really interesting work. I'm happy to see the LSTM performs so well in comparison to other nowcasting methods.

I tried tuning the hyperparameters using k-fold cross validation on a single country in my sample of 125 countries but it takes a lot of time and there is a pretty fair chance that the selected parameters are not optimal for the other countries. Have you considered something like the tunegrid option in the caret package?

dhopp1 commented 2 years ago

If you use mean-filling for ragged-edges tuning usually isn't so bad. I've found the marginal benefit you sometimes get from ARMA doesn't really outweight the additional time required to run and train those models. I've thought about implementing some sort of automatic hyperparameter tuning, e.g. something like auto.arima for ARMA models, but for now I'm leaving tuning outside the library until I take the time to properly work out the best way to incorporate that.

dhopp1 commented 2 years ago

@JRSlotman I added automatic variable selection and hyperparameter tuning in v0.2.0. Ostensibly you shouldn't ever have to spend any of your own time with model selection work anymore, just the computer's. Substantive explanations are available in the Python example file, the R example file just has some comments on the specifics for usage in R. Feature contribution has also been added for model interpretability, sort of like coefficients in a linear model.