cmu-delphi / covidcast

R and Python packages supporting Delphi's COVIDcast effort.
https://delphi.cmu.edu/covidcast/
33 stars 28 forks source link

Cast "time_value" column to datetime format before returning dataframe #255

Closed dshemetov closed 3 years ago

dshemetov commented 3 years ago

Currently, we have to do this

df = covidcast.signal("jhu-csse", "deaths_incidence_num", geo_type="state", start_day=date(2020, 9, 1))
df["time_value"] = pd.to_datetime(df["time_value"])

If you don't cast to datetime, you can't plot in matplotlib with the time_value as the index.

plt.plot(df[df["geo_value"] == "ca"].set_index("time_value")["value"]) #gives an error

Should require just a one line change in the Python client. That pd.to_datetime function is idempotent, so it shouldn't break anyone's code that is already doing that manually.

capnrefsmmat commented 3 years ago

@sarah-colq I hear you're interested in working more on the client packages; want to take this one?

chinandrew commented 3 years ago

Hm, I can't reproduce this.

In the code, this line is already here, and the df returned has that colume as a datetime object. You example error also fails to error for me. The cast was in an early PR so doubt it's a version error, but to be thorough have you tried upgrading to the latest version?

https://github.com/cmu-delphi/covidcast/blob/db1d1bcecb7518c30b02c53aa9959852c20dd9bc/Python-packages/covidcast-py/covidcast/covidcast.py#L408

In [10]: df = covidcast.signal("jhu-csse", "deaths_incidence_num", geo_type="state", start_day=date(2020, 11, 1))                                                    

In [11]: df.dtypes                                                                                                                                                   
Out[11]: 
geo_value              object
signal                 object
time_value     datetime64[ns]
issue          datetime64[ns]
lag                     int64
value                   int64
stderr                 object
sample_size            object
geo_type               object
data_source            object
dtype: object

In [12]: df.time_value                                                                                                                                               
Out[12]: 
0    2020-11-01
     ...
51   2020-11-14
Name: time_value, Length: 728, dtype: datetime64[ns]
In [16]: plt.plot(df[df["geo_value"] == "ca"].set_index("time_value")["value"])                                                                                      
Out[16]: [<matplotlib.lines.Line2D at 0x7fdd21f81cd0>]
dshemetov commented 3 years ago

Huh, the error seems to be gone on my end now?? 🤷

chinandrew commented 3 years ago

Well I guess I'll close for now, if it comes up again we can reopen and investigate.