Closed colin99d closed 1 year ago
Reminder to self: SQUASH THE THING!
I went ahead and coverted all data to csv. Here are the results:
48K Categories
4.6M Cryptocurrencies
1.2M Currencies
19M ETFs
687M Equities
108M Funds
36M Indices
632K Moneymarkets
1.4M cryptos.csv
92K currencies.csv
68M equities.csv
8.3M etfs.csv
41M funds.csv
5.0M indices.csv
120K moneymarkets.csv
Looks like total file size goes from 856.5 MB
to 123.9 MB
Did a little fixy-fix:
>>> import financedatabase as fd
>>> equities = fd.Equities()
>>> equities.options(selection='sector', country='united states')
array(['Healthcare', 'Basic Materials', 'Financial Services',
'Industrials', 'Consumer Defensive', 'Real Estate',
'Consumer Cyclical', 'Technology', 'Communication Services', nan,
'Services', 'Utilities', 'Energy', 'Consumer Goods',
'Industrial Goods', 'Financial', 'Conglomerates'], dtype=object)
>>> equities.options(selection='sector', country='United States')
array(['Healthcare', 'Basic Materials', 'Financial Services',
'Industrials', 'Consumer Defensive', 'Real Estate',
'Consumer Cyclical', 'Technology', 'Communication Services', nan,
'Services', 'Utilities', 'Energy', 'Consumer Goods',
'Industrial Goods', 'Financial', 'Conglomerates'], dtype=object)
>>>
With:
if capitalize:
country, sector, industry = country.title(), sector.title(), industry.title()
Because most of the items are always capitalized, I wanted to make sure that when people do not capitalize it still works. I couldn't find a scenario where sector is capitalized but industry or country isn't so all are in this one but it is an argument people can put to False (True by default).
Looks good to me!
One last thing though, we need search to work for multiple queries. Sometimes you might want to delve deeper in your data than just one query, e.g. like this (random example):
>>> equities.search(query="tesla")
symbol short_name long_name ... zipcode website market_cap
127734 TL0.DE TESLA INC. DL -,001 Tesla, Inc. ... 94304 http://www.tesla.com Mega Cap
127736 TL0.F TESLA INC. DL -,001 Tesla, Inc. ... 94304 http://www.tesla.com Mega Cap
129255 TSLA.BA TESLA INC Tesla, Inc. ... 94304 http://www.tesla.com Mega Cap
129256 TSLA.MI TESLA Tesla, Inc. ... 94304 http://www.tesla.com Mega Cap
129257 TSLA.MX TESLA INC Tesla, Inc. ... 94304 http://www.tesla.com Mega Cap
129258 TSLA Tesla, Inc. Tesla, Inc. ... 94304 http://www.tesla.com Mega Cap
129259 TSLA.VI TESLA INC Tesla, Inc. ... 94304 http://www.tesla.com Mega Cap
129260 TSLA34.SA TESLA INC DRN Tesla, Inc. ... 94304 http://www.tesla.com Mega Cap
131468 TXLZF TESLA EXPLORATION LTD Tesla Exploration Ltd. ... T2E 4J7 NaN Nano Cap
[9 rows x 15 columns]
>>> equities.search(query="tesla").search("exploration", "long_name"
... )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/jeroenbouma/opt/anaconda3/envs/findata/lib/python3.9/site-packages/pandas/core/generic.py", line 5902, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'search'
Unfortunately equities
is an instance of the Equities
class, while search
returns a DataFrame
object. I think the best way to handle this, is to send search terms as kwargs. For example:
equities.search(summary="tesla", long_name="exploration")
Of course this is not the only solution, I would love to know your thoughts!
Pandas performance: %timeit fd.select_equities() Normal: 2.16 s ± 6.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) OOP: 111 ms ± 3.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit fd.select_equities(country="Germany") Normal: 983 ms ± 1.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) OOP: 18.9 ms ± 1.3 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)