JerBouma / FinanceDatabase

This is a database of 300.000+ symbols containing Equities, ETFs, Funds, Indices, Currencies, Cryptocurrencies and Money Markets.
https://www.jeroenbouma.com/projects/financedatabase
MIT License
3.07k stars 357 forks source link

[IMPROVE] remove accented characters from name column #42

Closed maread99 closed 10 months ago

maread99 commented 1 year ago

Hi, what an awesome resource you've put together here! Thank you.

I've noticed that some of the entries have accented characters in the name column which don't appear to decode correctly (at least for me), for example:

equities.search(name="telef", exchange="MCE")

image So neither of the following return anything:

>>> df = equities.search(name="telefonica", exchange="MCE")
>>> df.empty
True
>>> equities.search(name="telefónica", exchange="MCE")
>>> df.empty
True

The following returns a load of instruments because many entries for Telefonica do not included the accented 'o':

>>> df = equities.search(name="telefonica")
>>> len(df)
20

...although it won't include the Madrid listing (and other entries that have the accented o in the name).

To ensure consistent querying I'd suggest replacing all accented characters in the name column with their unaccented equivalents.

If I get a moment (unlikely tbh) I'll contribute the change. Thought I'd raise the issue in the meantime in case anyone else runs into this and has the opportunity to make the changes.

Thanks again for the library! Marcus

JerBouma commented 1 year ago

Good call! I would suggest not to replace the actual name but instead make sure the accented characters are properly included.

It should then also include a Boolean ("accent_sensitive") parameter in case you do want to search specifically for "telephóne" instead of "telephone". By default this is off so you get both results.

We'd need to figure out whether this is easily doable.

JerBouma commented 10 months ago

This issue has been resolved. I've renamed most of the names for a whole lot of tickers. Quite a cumbersome task but it's done!

image