JerBouma / FinanceDatabase

This is a database of 300.000+ symbols containing Equities, ETFs, Funds, Indices, Currencies, Cryptocurrencies and Money Markets.
https://www.jeroenbouma.com/projects/financedatabase
MIT License
3.05k stars 355 forks source link

Adding market-cap offline #3

Closed mrx23dot closed 3 years ago

mrx23dot commented 3 years ago

Would it worth adding the market-cap into the offline DB? It doesn't change that often, and it would make market-share calculation much faster.

JerBouma commented 3 years ago

While for some companies it doesn't change often, it would lead to outdated metrics for companies that grow quickly. For example companies like Tesla and Amazon. It would also force me to update quarterly to collect the most recent market cap. I'd prefer to steer away from that.

Instead, find attached the market cap of all equities (as of the 3rd of February).

MarketCap.zip

mrx23dot commented 3 years ago

True, but any past data would still be more useful than no data.

Even if not updated at all (but date provided). 80 000 API calls is a lot just to have some basic understanding of the field. Your call.

For competitor analysis is there any better way than looking for other companies in same industry? Like a restaurant would be different from fast-food one but they are in the same Restaurants category. eg. Domino's pizza vs McDonald's Maybe a company: [list_of_products] database.

Cheers

JerBouma commented 3 years ago

There is a legal issue here as well. Data is collected from Yahoo Finance and if I would store all datapoints obtained from the API, they would have (even more) legal ground to shut down this repository. I'd like to prevent that by minimising the amount of data points and not be time-reliant while still being informative. Besides that, I wonder why you want to collect data of all equities as going over 80.000 tickers is quite difficult.

For a competitor analysis, your best bet is to go over their summary. With the function search_products you can search for specific keywords. This can also be a list of the most prominent fast-food by for example scraping the lists found here which results in this:

import wikipedia as wp

fast_food_chains = wp.page("List_of_fast_food_restaurant_chains")
list_of_fast_food_companies = fast_food_chains.links

copy = list_of_fast_food_companies.copy()

for item in copy:
    if 'List' in item:
        list_of_fast_food_companies.remove(item)

print(list_of_fast_food_companies)

In some areas this can be difficult, the difference between restaurants is a good example. An easier example would be comparing semiconductor companies or airlines because these companies usually stick to one specific niche.

I currently have no data that gives information on the products created by each company. This is something highly detailed you would have to scrape elsewhere. Unfortunately, this is also where you come into the area of paid services. Bloomberg Terminals probably offer this which costs thousands a year.

mrx23dot commented 3 years ago

That's clever, I would have used NLP to break down the summary description into subjects, interesting that no one tried to tackle this problem in an open source project. I will look into and let you know if I have some usable results.

JerBouma commented 3 years ago

That's clever, I would have used NLP to break down the summary description into subjects, interesting that no one tried to tackle this problem in an open source project. I will look into and let you know if I have some usable results.

Yes please let me know! Note that some summaries do not contain a lot of information. For example McDonalds' one:

McDonald's Corporation operates and franchises McDonald's restaurants in the United States and internationally. Its restaurants offer various food products and beverages, as well as breakfast menu. As of December 31, 2019, the company operated 38,695 restaurants. McDonald's Corporation was founded in 1940 and is based in Chicago, Illinois.

Thus you perhaps need to collect data from different sources.