chrisdev / django-pandas

Tools for working with pandas in your Django projects
BSD 3-Clause "New" or "Revised" License
798 stars 117 forks source link

io.read_frame to optionally convert df.index to pd.datetime #110

Open jabrennem opened 5 years ago

jabrennem commented 5 years ago

I am enjoying django-pandas. When I define a query set and use qs.to_dataframe(index='custom_int_index_col'), my index column is always converted to a datetime object.

I could be missing some greater context, but can we wrap the line below located in io.read_frame under an if statement using a new boolean argument or something similar and have QuerySet.to_dataframe default to False? That way, the index can be set to an integer column and it won't be converted to datetime.

if {new_boolean}: df.index = pd.to_datetime(df.index, errors="ignore")

vtoupet commented 5 years ago

I totally agree. I've created a PR for this (https://github.com/chrisdev/django-pandas/pull/111). Would it fit your needs?

jabrennem commented 5 years ago

I looked at the PR. It's almost perfect for what I need. I left a comment on django_pandas/managers.py. Thank you so much!!

vtoupet commented 5 years ago

@chrisdev Are you ok with this PR? If yes, could you accept it and merge?

Thanks

jabrennem commented 5 years ago

I am ok with the PR. I don't have write access to the repository to merge it.

chrisdev commented 5 years ago

@jabrennem the PR is not passing Travis. Was waiting on you to address failures?

jabrennem commented 5 years ago

@chrisdev sorry, I didn't see that he included your name. I just opened the issue to expand on the use case.

vtoupet commented 5 years ago

@jabrennem the PR is not passing Travis. Was waiting on you to address failures?

@chrisdev Sorry I did not see that. I've just fixed the tests. It is passing now.

jabrennem commented 5 years ago

@chrisdev I see it is in master. When will this feature be available via pip?

MikeSandford commented 5 years ago

Hi all, just getting started with this project to try and do a bit of DS for a side project. Very happy that it exists as it's so much cleaner than doing all the manipulations yourself. But I too have run into this problem and been confused! I've got a queryset that I want (or at least think I want) to index based on the object's ID but they're all coming out as datetimes in seconds after the epoch in 1970. Took me a while to figure out what was going on at first because I was also using dataframe.head wrong (no parens) but once I figured that out it became obvious.

A thought I had regarding a hopefully very general solution is that you could allow people to pass in the datatype for the index and default it to datetime.

my_frame = my_queryset.to_dataframe(index="my_index", index_dtype=int)

and then the code inside would need to use the Pandas astype in order to make that work, right? http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.astype.html#pandas.Series.astype

df.index = df.index.astype(index_dtype)

Does that seem right? I'm willing to make a PR but as I'm brand-new to this project even existing I don't know if this would be a welcome addition or not. Thanks!

EDIT: I just did this with the copy on my local machine and it's working fine; happy to make a PR if others think it would be useful.

chrisdev commented 5 years ago

@MikeSandford I just added @vtoupet PR to PyPi (sorry for the delay @vtoupet !) which deals with specifically with date time indexes. I think your approach is cool. But it would be nice if we had sensible defaults. I mean I'm hoping that in most cases the user would not have to specify the index type. But of course when they need to do so its nice to give them the option.

MikeSandford commented 5 years ago

@chrisdev agreed that sensible defaults are necessary! Here is what I did:

def read_frame(qs, fieldnames=(), index_col=None, coerce_float=False,
               verbose=True, index_dtype=None):

and then further down:

if index_dtype is None:
    # this is where fancy "guess the datatype" heuristics would go, so we can do whatever
    # we can automagically if the user doesn't specify "I want this datatype"
    # for now we just assume the default behavior of datetime
    index_dtype = datetime
df.index = df.index.astype(index_dtype)

I also had to from datetime import datetime up at the top.