addisonlynch / iexfinance

Python SDK for IEX Cloud
https://addisonlynch.github.io/iexfinance
Apache License 2.0
648 stars 136 forks source link

Historical data, is the column name of Date missing #238

Closed sl00k closed 3 years ago

sl00k commented 3 years ago

Did the output of historical data changed?

df1 = get_historical_data("%s" % symbol, start1, end2, close_only=True, output_format='pandas', token="")

output (without -----):
--------------- close volume 2020-12-11 214.06 4283354

So the date is missing date I think. If I put it into a new sql

Index close volume 2020-12-11 00:00:00 214.06 4283354

So my old table with columns: date, close, volume doesn`t work anymore, because Index is missing.

I think panda names the first column "Index" because it has no name anymore. I will check with an old version now

sl00k commented 3 years ago

with version 0.43

output (without the ----) --------- close volume date
2020-12-11 214.06 4283354

addisonlynch commented 3 years ago

The date column is used as the index and dropped as a DF column now. I can revert this if it is causing issues. I suppose there is no reason to drop it after setting it as index.

sl00k commented 3 years ago

Ok good. I will change my code then. I tryed it our really quick but I cant select row by index =2020-12-11 for example.

Is it a lot of code to change for me if I want to use the old style with date as column name?

sl00k commented 3 years ago

ok I am not able to select row with index = something

how can I undo the change? Can someone please help me.

dcwtx commented 3 years ago

Pandas indexing - https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

If you want to get the entire index as a list, ‘index_list=df.index'

See the section on slicing ranges and selecting by label.

It is not conventional for the index to be named as a column - the way Addison just described it is how pandas works out of the box.

On Dec 15, 2020, at 2:18 PM, sl00k notifications@github.com wrote:

ok I am not able to select row with index = something

how can I undo the change? Can someone please help me.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/addisonlynch/iexfinance/issues/238#issuecomment-745542491, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIJ2JIC2CP4LDTNSYCLL4TSU7ABJANCNFSM4UZW4SJA.

sl00k commented 3 years ago

I just wonder why I can not select

index= "2020-12-14 00:00:00" cursor.execute("SELECT * FROM {} WHERE index = '{}' ".format(stock,index))

does it mean index can not be selected at all? might be the wrong place, but it messed all my code up :(

dcwtx commented 3 years ago

It would depend on how you loaded things into your database. Are you asking a pandas question or a SQL question? What you're getting back from this module is a pandas data frame.

There is no telling what you are doing with it after you get it, but your code segment above looks like you're pulling data out of a database, not using a data frame.

Pandas has built in methods to interact with sql databases, you might think about using those on the data that you get out of iexfinance. You can read about it here https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql. You'd be interested in the .to_sql method.

I don't think this is what you're trying to do, but if you wanted to select the row of data in the DataFrame that you get back from iexfinance for the date you wrote above, you'd get it by using the .loc method: df.loc['2020-12-14'] Pandas usually drops the time fragment from the index when there aren't multiple entries on a single date.

To take that above example one step further, df.loc['2020-12-14'].to_sql(_table_name_, con) would push that row of data into the database. You'd have to wrangle it a bit to put the date field in there as well, but it's easy enough to do.

dcwtx commented 3 years ago

For example:

for date in df.index:
    row=df.loc[date]
    row['date']=date
    row.to_sql(table_name, con)

That's a horrifically inefficient way of doing this. Much better would be to:

df=get_historical_dates(symbol, ...)
df['date']=df.index
df.to_sql(table_name, con)

That would dump the entire df that is returned from iex into a sql database.

By the way, sql databases are really slow. Using pandas built-in pickler is a couple of orders of magnitude faster. You can also make a list of data frames for each symbol, then just pickle the list. If you're worried about message count, you can unpickle, query for the day, add the data to each DataFrame, then pickle it again. Super simple.

addisonlynch commented 3 years ago

Re-added the column. Now available in the dev version.