Closed thre3eye closed 6 years ago
@enalposi we already have the scraper for wikipedia it's just not running every day at the moment. We will get this fixed in next 24h or so.
Thaks for reporting and this is now fixed in https://github.com/datasets/s-and-p-500-companies/commit/5dfdfbc3b9f52802f89e72da6e050b5cafdec4a6 - our daily pipeline broke for some reason but will now be back up.
Note the authoritative location is https://datahub.io/core/s-and-p-500-companies - and this includes JSON versions etc.
I switched to my own scraper but took a quick look since I started the trouble here :)
Maybe it takes some time to propagate but the json link on that page: https://pkgstore.datahub.io/core/s-and-p-500-companies/constituents_json/data/b0c0dbabbc66fa902dd40a9e5596263e/constituents_json.json still contains components that have been removed from the S&P500 and are listed as such on the wikipedia source. See for example CHK, YHOO, etc
If the scraper works like mine and stores the data in a DB maybe there is a bug in the logic not updating newly removed records in the DB. That said I see YHOO removed in the referenced change list - just not in the json.
@Mikanebu can you please check this.
@enalposi Thanks for reporting this. You are looking at old version of JSON file. It is up to date now, and does not contain you mentioned components such as YHOO
, CHK
. Please, see the latest updated one https://pkgstore.datahub.io/core/s-and-p-500-companies/constituents_json/data/64dd3e9582b936b0352fdd826ecd3c95/constituents_json.json. Please, feel free to reopen issue, if you see something irrelevant.
Hi and just fyi - I used the latest link to the json file on https://datahub.io/core/s-and-p-500-companies when reporting the later issue so that was stale for some reason. It looks updated now. Cheers.
Hi, I am using this API, I think is quite useful for my strategy.
But I found out that there are some old companies in the csv and the json:
ANDV
CSRA
DPS
EVHC
GGP
LUK
MON
WYN
XL
I can't fetch Yahoo finance through these symbols and they are probably outdated because delist or acquired, could you help me to check why they are still on the list?
@svetozarstojkovic can you check this - is the travis job running correctly?
Hi (I am moving this issue to this tracker)
The S&P500 dataset contains outdated constituents. Maybe there is an issue with the logic parsing wikipedia and removing eliminated/changed symbols?
i.e. search for DOW (Down Chemical Company - ticker gone due to merger), or RAI (Reynolds American - tricker gone due to buyout). I thin the full list of removed symbols as of now is AN, BCR, R, DD, SPLS, CHK, BBBY, DLPH, SIG, LVLT, BHI, RIG, DNB, YHOO, DOW, FTR, PDCO, SNI, COH, TSO, MJN, SWN, HAR, PCLN, HCN, MNK, FSLR, TGNA, WFM, URBN, MUR, CBG, LLTC, TDC, RAI
Relevant data set:
okfn data (containing outdated tickers): https://pkgstore.datahub.io/core/s-and-p-500-companies/constituents_json/data/b0c0dbabbc66fa902dd40a9e5596263e/constituents_json.json
wikipedia source: https://en.wikipedia.org/wiki/List_of_S%26P_500_companies
Sorry, I'd try and fix the code but Python isn't my natural habitat. I have since written a Wikipedia scraper in Java but that's probably of little help here.
Thanks