505 Companies - Githubissues

tjradcliffe commented 7 years ago

This listing for the S&P500 has 505 companies in it. There should probably be some kind of invariant imposed so updates are rejected if they include something other than 500 companies.

gh-isoar commented 7 years ago

505 is the correct number of stock symbols. Certain companies have multiple tickers that are all included in the average; for example, common and preferred.

If one of those companies gets dropped from the "S&P 500", the correct count of symbols would change by more than one. It can never go below 500, though.

On Aug 3, 2017, at 2:09 AM, tjradcliffe notifications@github.com wrote:

This listing for the S&P500 has 505 companies in it. There should probably be some kind of invariant imposed so updates are rejected if they include something other than 500 companies.

rufuspollock commented 6 years ago

@gh-isoar @tjradcliffe but should we eliminate those dual symbols so we have just 500 companies (perhaps with multiple symbols)?

@gh-isoar and thanks for the info 😄

gh-isoar commented 6 years ago

@rufuspollock "The S&P 500" trademark is owned by S&P which is the sole arbiter of its definition; obviously this list must accurately reflect that definition. That definition comprises not just 500 companies, but the specific stock tickers that S&P has decided accurately represent those companies. In some cases the correct subset of a company's tickers is obscure; S&P's relevant decisions are documented when made but are impossible to reliably infer from non-S&P data.

I see two categories of usage that the list should support well:

calculation of values that can be compared, or used in conjunction, with those calculated by other users of "The S&P 500" definition such as investment houses; such uses must by driven by the first column
determination of presence or absence of a company in the list; such uses must operate by matching a substring of the second column

Eliminating the dual symbols would make the first category of usage impossible. Keeping them by putting multiple symbols in the first column would be "de-normalizing" the column - usage would be possible for only the most sophisticated and determined users.

In contrast, redundancy in the second column is easily handled by most users.

rufuspollock commented 6 years ago

@gh-isoar super useful and clear - and something I think we will add to the README. Thank-you again for your clarifications.

datasets / s-and-p-500-companies

505 Companies #18