datopian / datahub-qa

:package: Bugs, issues and suggestions for datahub.io
https://datahub.io/
32 stars 6 forks source link

Outdated SP500 data #182

Closed thre3eye closed 6 years ago

thre3eye commented 6 years ago

Hi. The S&P500 dataset contains outdated constituents. Maybe there is an issue with the logic parsing wikipedia and removing eliminated/changed symbols?

i.e. search for DOW (Down Chemical Company - ticker gone due to merger), or RAI (Reynolds American - tricker gone due to buyout). I thin the full list of removed symbols as of now is AN, BCR, R, DD, SPLS, CHK, BBBY, DLPH, SIG, LVLT, BHI, RIG, DNB, YHOO, DOW, FTR, PDCO, SNI, COH, TSO, MJN, SWN, HAR, PCLN, HCN, MNK, FSLR, TGNA, WFM, URBN, MUR, CBG, LLTC, TDC, RAI

Relevant data set:

okfn data (containing outdated tickers): https://pkgstore.datahub.io/core/s-and-p-500-companies/constituents_json/data/b0c0dbabbc66fa902dd40a9e5596263e/constituents_json.json

wikipedia source: https://en.wikipedia.org/wiki/List_of_S%26P_500_companies

Thanks

rufuspollock commented 6 years ago

@enalposi thanks for reporting this 👍

We're aware of the issue here and we'll get it fixed asap.

The best place to report issues on this specific dataset is here btw: https://github.com/datasets/s-and-p-500-companies - you may want to open an issue there instead.

thre3eye commented 6 years ago

Oh, ok. I wasn't aware there are several github projects. I'll close here and open where you said. Thanks.