PacktPublishing / Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original

Machine Learning for Algorithmic Trading, Second Edition - published by Packt
MIT License
1.26k stars 485 forks source link

Error loading data #1

Closed khuyentran1401 closed 4 years ago

khuyentran1401 commented 4 years ago

In chapter 4, alpha factor research, I ran the code in feature_engineering.ipynb and cannot load the data in the directory ./data/assets.h5

I ran this code block

with pd.HDFStore(DATA_STORE) as store:
    prices = (store['quandl/wiki/prices']
              .loc[idx[str(START):str(END), :], 'adj_close']
              .unstack('ticker'))
    stocks = store['us_equities/stocks'].loc[:, ['marketcap', 'ipoyear', 'sector']]

And this is what I got:

KeyError: 'No object named quandl/wiki/prices in the file'

It seems like ./data/assets.h5 does not contain any data. I wonder if this is the reason why I could not run the code?

zxweed commented 4 years ago

This file is pre-created by the script data/create_datasets.ipynb

khuyentran1401 commented 4 years ago

Got it. Thanks for the help!

hawyadowin commented 3 years ago

Hello,

It still didn't work for me.

The metadata URL appears to be invalid.

"url = 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange={}&render=download' exchanges = ['NASDAQ', 'AMEX', 'NYSE'] df = pd.concat([pd.read_csv(url.format(ex)) for ex in exchanges]).dropna(how='all', axis=1) df = df.rename(columns=str.lower).set_index('symbol').drop('summary quote', axis=1) df = df[~df.index.duplicated()] print(df.info()) "

EDIT:

I had downloaded the outdated notebook from packetpub. I have updated it to the following code and it works:

exchanges = ['NASDAQ.csv', 'AMEX.csv', 'NYSE.csv'] df = pd.concat([pd.read_csv(ex) for ex in exchanges]).dropna(how='all', axis=1) df = df.rename(columns=str.lower).set_index('symbol')#.drop('summary quote', axis=1) df = df[~df.index.duplicated()] print(df.info())

stefan-jansen commented 3 years ago

Please use the notebook in the book repository that is more actively maintained: https://github.com/stefan-jansen/machine-learning-for-trading/blob/main/data/create_datasets.ipynb

You'll find that NASDAQ has disabled automatic downloads a while ago, but you can still load the data manually.

gamaiun commented 1 year ago

Stefan, you have done a great service and made a huge contribution to the algo community, inspiring quants like myself. Yet, a more user-friendly dataset would be much appreciated. Unfortunately, I wasn't able to run the assets using 'stocks = store['us_equities/stocks'].loc...'

stefan-jansen commented 1 year ago

@gamaiun Have you followed the instructions here to create the dataset at all?

Jasdeep425 commented 1 year ago

How come when I ever try to follow the instructions in the create datasets notebook I keep getting this error while trying to create the DATA_STORE = Path('assets.h5')

with pd.HDFStore(DATA_STORE) as store: prices = (store['quandl/wiki/prices'] .loc[idx[str(START):str(END), :], 'adj_close'] .unstack('ticker')) stocks = store['us_equities/stocks'].loc[:, ['marketcap', 'ipoyear', 'sector']]

Unable to open/create file 'assets.h5'

TVI-BIZ commented 10 months ago

I also spend some time to solve issue with the error during 'assets.h5' loading. In notebook create_dataset missing one step - add metadata us_equities/stocks you need just add this to the notebook create_dataset df = pd.read_csv('us_equities_meta_data.csv') with pd.HDFStore(DATA_STORE) as store: store.put('us_equities/stocks', df) and you will have complete 'assets.h5' file.