anttsou / qmj

8 stars 8 forks source link

Do we want to set up a (relatively painless) way of updating/retrieving the companies from the Russell 3000 Index? #10

Closed rynkwn closed 9 years ago

rynkwn commented 9 years ago

If memory serves me correctly, parsing the original data directly/programmatically is extremely painful, given the .pdf encoding that the Russell 3000 company list is saved as.

Given that the list only updates once a year, and that it's (relatively) simple to create a data frame like what we expect as input, I also wouldn't say this is crucial, but it's food for thought.

By the way, what I did to get the original list that we're currently using is to copy and paste the text into notepad, remove by hand some unwanted artifacts (typically something on the order of a "page end" marker), and then parsed the result with a line or two of R. Practically, the user would likely need to copy the text into an appropriate file and then call our function in order to produce the data frame.

rynkwn commented 9 years ago

My apologies. I didn't realize we already had this function in get_companies