Closed juanitorduz closed 1 year ago
sf.forecast(ray_df, ...)
directly will work now by the way (as soon as Ray works here). You don't need to write all this extra code unless you have some specific logic you are trying to inject like experiment tracking or partitioning.
Hard to say here. Was reading this, doesn't have a clear solution but did you try vanilla ray code on the cluster (without Fugue and StatsForecast), and did you try the environment variable mentioned here. I am not optimistic it will work but worth a shot.
Then also check if you can access the head node? Using the comment after the one I linked with netstat
Also, try to use context manager
with ray.init(...):
sf.forecast(ray.data.from_pandas(pd_df), ...)
you don't need transform directly, and you don't need to specify engine or partition or schema, just pass in a ray df to statsforecast
Thanks for the feedback πͺ! I am getting the same error from the vanilla sf.forecast approach and that is the reason I'm using transform directly (to see if there is something that I can modify).
I'll keep working on it following your tips (e.g. Context manager).
Thanks π
I tried on a clean environment with
statsforecast==1.6.0
fugue[ray]==0.8.7.dev4
ray[data]==2.7.0
It works without any problem, here is the code:
import ray
import pandas as pd
from fugue import transform
import fugue.api as fa
from statsforecast.core import StatsForecast
from statsforecast.models import (
AutoARIMA,
AutoETS,
)
from statsforecast.utils import generate_series
n_series = 2
horizon = 15
series = generate_series(n_series, engine="pandas")
series = series.reset_index()
series["unique_id"] = series["unique_id"].astype(str)
with ray.init():
models=[AutoETS(season_length=7), AutoARIMA(season_length=7)]
st = StatsForecast(models=models, freq="D")
res = fa.as_pandas(st.forecast(horizon, ray.data.from_pandas(series)))
print(res)
Thank you! It indeed works locally :) However, then we have a Ray cluster on K8s, we still get the error before ... Do you maybe have any tips?
(This has been already quite helpful!)
with ray.init():
df = ray.data.from_pandas(series)
df.to_pandas()
Can you run this on your k8s cluster?
Hey! I think there is something we need to fix in our JupyterHub integration. Let me close this issue while we figure things out! I will come back with the learnings to share with the community! Thank you for being so supportive and keep up the great work π
Hi! I am trying to run a simple example using Statsforecast + Ray + JupyterHub. It works locally but not on the cluster. Do you have any tips?
Details:
This simple example from the documentation work locally but when running on the cluster I get