Closed khuyentran1401 closed 4 years ago
This file is pre-created by the script data/create_datasets.ipynb
Got it. Thanks for the help!
Hello,
It still didn't work for me.
The metadata URL appears to be invalid.
"url = 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange={}&render=download' exchanges = ['NASDAQ', 'AMEX', 'NYSE'] df = pd.concat([pd.read_csv(url.format(ex)) for ex in exchanges]).dropna(how='all', axis=1) df = df.rename(columns=str.lower).set_index('symbol').drop('summary quote', axis=1) df = df[~df.index.duplicated()] print(df.info()) "
EDIT:
I had downloaded the outdated notebook from packetpub. I have updated it to the following code and it works:
exchanges = ['NASDAQ.csv', 'AMEX.csv', 'NYSE.csv'] df = pd.concat([pd.read_csv(ex) for ex in exchanges]).dropna(how='all', axis=1) df = df.rename(columns=str.lower).set_index('symbol')#.drop('summary quote', axis=1) df = df[~df.index.duplicated()] print(df.info())
Please use the notebook in the book repository that is more actively maintained: https://github.com/stefan-jansen/machine-learning-for-trading/blob/main/data/create_datasets.ipynb
You'll find that NASDAQ has disabled automatic downloads a while ago, but you can still load the data manually.
Stefan, you have done a great service and made a huge contribution to the algo community, inspiring quants like myself. Yet, a more user-friendly dataset would be much appreciated. Unfortunately, I wasn't able to run the assets using 'stocks = store['us_equities/stocks'].loc...'
@gamaiun Have you followed the instructions here to create the dataset at all?
How come when I ever try to follow the instructions in the create datasets notebook I keep getting this error while trying to create the DATA_STORE = Path('assets.h5')
with pd.HDFStore(DATA_STORE) as store: prices = (store['quandl/wiki/prices'] .loc[idx[str(START):str(END), :], 'adj_close'] .unstack('ticker')) stocks = store['us_equities/stocks'].loc[:, ['marketcap', 'ipoyear', 'sector']]
Unable to open/create file 'assets.h5'
I also spend some time to solve issue with the error during 'assets.h5' loading.
In notebook create_dataset
missing one step - add metadata us_equities/stocks
you need just add this to the notebook create_dataset
df = pd.read_csv('us_equities_meta_data.csv') with pd.HDFStore(DATA_STORE) as store: store.put('us_equities/stocks', df)
and you will have complete 'assets.h5' file.
In chapter 4, alpha factor research, I ran the code in feature_engineering.ipynb and cannot load the data in the directory
./data/assets.h5
I ran this code block
And this is what I got:
It seems like
./data/assets.h5
does not contain any data. I wonder if this is the reason why I could not run the code?