I'm trying to use a .csv to read ~7500 stock symbols and download the data using the alpha_vantage API.
Here is an example using my naive way:
from alpha_vantage.timeseries import TimeSeries
import pandas as pd
api_key = ""
def get_ts(symbol):
ts = TimeSeries(key=api_key, output_format='pandas')
data, meta_data = ts.get_daily_adjusted(symbol=symbol, outputsize='full')
fname = "./data_dump/{}_data.csv".format(symbol)
data.to_csv(fname)
symbols = ['AAPL', 'GOOG', 'TSLA', 'MSFT']
for s in symbols:
get_ts(s)
Unfortunately this takes ~3 hrs using tqdm as a wrapper. So let's try to follow the article y'all wrote for async usage:
from alpha_vantage.async_support.timeseries import TimeSeries
import pandas as pd
import asyncio
async def get_data(symbol):
ts = TimeSeries(key=api_key, output_format='pandas')
try:
data, _ = await ts.get_daily_adjusted(symbol=symbol, outputsize='full')
await ts.close()
fname = "./data_dump/{}_data.csv".format(symbol)
data.to_csv(fname)
except:
pass
return(None)
if __name__ == "__main__":
nasdaq = pd.read_csv('nasdaq_screener.csv')
loop = asyncio.get_event_loop()
tasks = [get_data(symbol) for symbol in nasdaq.Symbol]
group1 = asyncio.gather(*tasks)
results = loop.run_until_complete(group1)
print(results)
The problem is I don't really know what I'm doing with async operations in python. It works well for about 140 of the files but then just gives me this output and stops running:
client_session: <aiohttp.client.ClientSession object at 0x000002395D890730>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x000002395D890CA0>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x000002395D89B250>
Unclosed client session
...
Any advice is appreciated! I'm not really sure if I'm handling the async api calls the right way, and I know pandas to_csv isn't async, but I'm trying to get the api logic right before trying anything fancier like aiofiles.
I'm trying to use a .csv to read ~7500 stock symbols and download the data using the alpha_vantage API.
Here is an example using my naive way:
Unfortunately this takes ~3 hrs using tqdm as a wrapper. So let's try to follow the article y'all wrote for async usage:
The problem is I don't really know what I'm doing with async operations in python. It works well for about 140 of the files but then just gives me this output and stops running:
Any advice is appreciated! I'm not really sure if I'm handling the async api calls the right way, and I know pandas
to_csv
isn't async, but I'm trying to get the api logic right before trying anything fancier likeaiofiles
.