datasets / s-and-p-500-companies

List of companies in the S&P 500 together with associated financials
https://datahub.io/core/s-and-p-500-companies
499 stars 491 forks source link

Outdated S&P500 companies #26

Closed thre3eye closed 6 years ago

thre3eye commented 6 years ago

Hi (I am moving this issue to this tracker)

The S&P500 dataset contains outdated constituents. Maybe there is an issue with the logic parsing wikipedia and removing eliminated/changed symbols?

i.e. search for DOW (Down Chemical Company - ticker gone due to merger), or RAI (Reynolds American - tricker gone due to buyout). I thin the full list of removed symbols as of now is AN, BCR, R, DD, SPLS, CHK, BBBY, DLPH, SIG, LVLT, BHI, RIG, DNB, YHOO, DOW, FTR, PDCO, SNI, COH, TSO, MJN, SWN, HAR, PCLN, HCN, MNK, FSLR, TGNA, WFM, URBN, MUR, CBG, LLTC, TDC, RAI

Relevant data set:

okfn data (containing outdated tickers): https://pkgstore.datahub.io/core/s-and-p-500-companies/constituents_json/data/b0c0dbabbc66fa902dd40a9e5596263e/constituents_json.json

wikipedia source: https://en.wikipedia.org/wiki/List_of_S%26P_500_companies

Sorry, I'd try and fix the code but Python isn't my natural habitat. I have since written a Wikipedia scraper in Java but that's probably of little help here.

Thanks

rufuspollock commented 6 years ago

@enalposi we already have the scraper for wikipedia it's just not running every day at the moment. We will get this fixed in next 24h or so.

rufuspollock commented 6 years ago

Thaks for reporting and this is now fixed in https://github.com/datasets/s-and-p-500-companies/commit/5dfdfbc3b9f52802f89e72da6e050b5cafdec4a6 - our daily pipeline broke for some reason but will now be back up.

Note the authoritative location is https://datahub.io/core/s-and-p-500-companies - and this includes JSON versions etc.

thre3eye commented 6 years ago

I switched to my own scraper but took a quick look since I started the trouble here :)

Maybe it takes some time to propagate but the json link on that page: https://pkgstore.datahub.io/core/s-and-p-500-companies/constituents_json/data/b0c0dbabbc66fa902dd40a9e5596263e/constituents_json.json still contains components that have been removed from the S&P500 and are listed as such on the wikipedia source. See for example CHK, YHOO, etc

If the scraper works like mine and stores the data in a DB maybe there is a bug in the logic not updating newly removed records in the DB. That said I see YHOO removed in the referenced change list - just not in the json.

rufuspollock commented 6 years ago

@Mikanebu can you please check this.

Mikanebu commented 6 years ago

@enalposi Thanks for reporting this. You are looking at old version of JSON file. It is up to date now, and does not contain you mentioned components such as YHOO, CHK. Please, see the latest updated one https://pkgstore.datahub.io/core/s-and-p-500-companies/constituents_json/data/64dd3e9582b936b0352fdd826ecd3c95/constituents_json.json. Please, feel free to reopen issue, if you see something irrelevant.

thre3eye commented 6 years ago

Hi and just fyi - I used the latest link to the json file on https://datahub.io/core/s-and-p-500-companies when reporting the later issue so that was stale for some reason. It looks updated now. Cheers.

tedchou12 commented 5 years ago

Hi, I am using this API, I think is quite useful for my strategy.

But I found out that there are some old companies in the csv and the json: ANDV CSRA DPS EVHC GGP LUK MON WYN XL I can't fetch Yahoo finance through these symbols and they are probably outdated because delist or acquired, could you help me to check why they are still on the list?

rufuspollock commented 5 years ago

@svetozarstojkovic can you check this - is the travis job running correctly?