Open jabrennem opened 5 years ago
I totally agree. I've created a PR for this (https://github.com/chrisdev/django-pandas/pull/111). Would it fit your needs?
I looked at the PR. It's almost perfect for what I need. I left a comment on django_pandas/managers.py. Thank you so much!!
@chrisdev Are you ok with this PR? If yes, could you accept it and merge?
Thanks
I am ok with the PR. I don't have write access to the repository to merge it.
@jabrennem the PR is not passing Travis. Was waiting on you to address failures?
@chrisdev sorry, I didn't see that he included your name. I just opened the issue to expand on the use case.
@jabrennem the PR is not passing Travis. Was waiting on you to address failures?
@chrisdev Sorry I did not see that. I've just fixed the tests. It is passing now.
@chrisdev I see it is in master. When will this feature be available via pip?
Hi all, just getting started with this project to try and do a bit of DS for a side project. Very happy that it exists as it's so much cleaner than doing all the manipulations yourself. But I too have run into this problem and been confused! I've got a queryset that I want (or at least think I want) to index based on the object's ID but they're all coming out as datetimes in seconds after the epoch in 1970. Took me a while to figure out what was going on at first because I was also using dataframe.head wrong (no parens) but once I figured that out it became obvious.
A thought I had regarding a hopefully very general solution is that you could allow people to pass in the datatype for the index and default it to datetime.
my_frame = my_queryset.to_dataframe(index="my_index", index_dtype=int)
and then the code inside would need to use the Pandas astype
in order to make that work, right?
http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.astype.html#pandas.Series.astype
df.index = df.index.astype(index_dtype)
Does that seem right? I'm willing to make a PR but as I'm brand-new to this project even existing I don't know if this would be a welcome addition or not. Thanks!
EDIT: I just did this with the copy on my local machine and it's working fine; happy to make a PR if others think it would be useful.
@MikeSandford I just added @vtoupet PR to PyPi (sorry for the delay @vtoupet !) which deals with specifically with date time indexes. I think your approach is cool. But it would be nice if we had sensible defaults. I mean I'm hoping that in most cases the user would not have to specify the index type. But of course when they need to do so its nice to give them the option.
@chrisdev agreed that sensible defaults are necessary! Here is what I did:
def read_frame(qs, fieldnames=(), index_col=None, coerce_float=False,
verbose=True, index_dtype=None):
and then further down:
if index_dtype is None:
# this is where fancy "guess the datatype" heuristics would go, so we can do whatever
# we can automagically if the user doesn't specify "I want this datatype"
# for now we just assume the default behavior of datetime
index_dtype = datetime
df.index = df.index.astype(index_dtype)
I also had to from datetime import datetime
up at the top.
I am enjoying django-pandas. When I define a query set and use qs.to_dataframe(index='custom_int_index_col'), my index column is always converted to a datetime object.
I could be missing some greater context, but can we wrap the line below located in io.read_frame under an if statement using a new boolean argument or something similar and have QuerySet.to_dataframe default to False? That way, the index can be set to an integer column and it won't be converted to datetime.
if {new_boolean}: df.index = pd.to_datetime(df.index, errors="ignore")