alvarobartt / investpy

Financial Data Extraction from Investing.com with Python
https://investpy.readthedocs.io/
MIT License
1.59k stars 375 forks source link

Ambiguous ETF names #128

Open ymyke opened 4 years ago

ymyke commented 4 years ago

Hi

The package seems to be using queries that can be ambiguous. E.g.:

etfs = investpy.get_etfs_dict()
[x for x in etfs if x["isin"] and "LU0480132876" in x["isin"] and x["country"] == "switzerland"]

produces:

[{'country': 'switzerland',
  'name': 'UBS MSCI Emerging Markets',
  'full_name': 'UBS MSCI Emerging Markets UCITS A-dis',
  'symbol': 'EMMUSA',
  'isin': 'LU0480132876',
  'asset_class': 'equity',
  'currency': 'USD',
  'stock_exchange': 'Switzerland',
  'def_stock_exchange': False},
 {'country': 'switzerland',
  'name': 'UBS MSCI Emerging Markets',
  'full_name': 'UBS MSCI Emerging Markets UCITS A-dis',
  'symbol': 'EMMCHA',
  'isin': 'LU0480132876',
  'asset_class': 'equity',
  'currency': 'CHF',
  'stock_exchange': 'Switzerland',
  'def_stock_exchange': False}]

The names of both ETFs are identical but they are used as the index into get_etf_historical_data:

investpy.get_etf_historical_data("UBS MSCI Emerging Markets", country="switzerland", from_date="01/01/2020", to_date="20/03/2020")

Which produces:

2020-01-03  111.52  111.78  111.30  111.70  USD Switzerland
2020-01-06  110.94  110.94  110.34  110.84  USD Switzerland
2020-01-07  111.24  111.74  110.92  111.20  USD Switzerland
[...]

Questions:

Thanks myke.

alvarobartt commented 4 years ago

Hi again @ymyke,

Currently as you can see, I was using ETF names as input so as to identify the ETFs from a concrete country, anyways, I did not know that more than one ETF from the same country and the same stock_exchange could have the same name. Since the full_name and the isin is also the same, do you think it will be better to use the ETF symbol and country as input?

And ambiguous queries are not handled since they are supposed not to happen, recently I included the stock_exchange information so as to differentiate ETFs with the same names in the same countries but from a different stock exchange, in this case investpy just takes the first match among the static data.

Thank you! Your answer here to the highlighted question is really important since as soon as you confirm me that the symbol is a better identifier than the name so to avoid ambiguous queries as the one you presented above, I will change it.

ymyke commented 4 years ago

Hi

I don't understand the data model behind investing.com in detail, but from what I can judge so far, the symbol might be a better choice to look up things – especially in the ETF world.

Here's some code to illustrate:

etfs = investpy.get_etfs_dict()
stocks = investpy.get_stocks_dict()
for bucket in (etfs, stocks):
    for attr in ("name", "symbol", "isin"):
        print("Unambiguous coverage for {}: {:.2%}".format(
            attr,
            len(set([x[attr] + x["country"] for x in bucket if x[attr] ])) / len(bucket))
        )

Which produces:

# ETFs:
Unambiguous coverage for name: 88.98%
Unambiguous coverage for symbol: 92.22%
Unambiguous coverage for isin: 89.65%
# Stocks:
Unambiguous coverage for name: 97.65%
Unambiguous coverage for symbol: 98.05%
Unambiguous coverage for isin: 97.70%

So I would love to get a way to lookup ETFs (and maybe also stocks, for consistency reason) via symbol.

Whether via a new function or an addition to the interface of the existing functions I did not reflect on. I guess backwards compatibility would be a plus.

Best myke.

geirrod commented 4 years ago

@alvarobartt any particular reason to use names in ETF calls and not symbols?

For the sake of uniformity, it would be better to use symbols. But, I bet you have some specific reasons to use names instead. Could you share them?

Cheers!

typhoon71 commented 4 years ago

I second the usage of symbol too, since there won't be conflict when ISIN are the same (which makes the name the same too): This would make more robust to use for ETF (and stocks/funds too).

I guess the reason names are used may be how the requst for the quotes page is made?