If memory serves me correctly, parsing the original data directly/programmatically is extremely painful, given the .pdf encoding that the Russell 3000 company list is saved as.
Given that the list only updates once a year, and that it's (relatively) simple to create a data frame like what we expect as input, I also wouldn't say this is crucial, but it's food for thought.
By the way, what I did to get the original list that we're currently using is to copy and paste the text into notepad, remove by hand some unwanted artifacts (typically something on the order of a "page end" marker), and then parsed the result with a line or two of R. Practically, the user would likely need to copy the text into an appropriate file and then call our function in order to produce the data frame.
If memory serves me correctly, parsing the original data directly/programmatically is extremely painful, given the .pdf encoding that the Russell 3000 company list is saved as.
Given that the list only updates once a year, and that it's (relatively) simple to create a data frame like what we expect as input, I also wouldn't say this is crucial, but it's food for thought.
By the way, what I did to get the original list that we're currently using is to copy and paste the text into notepad, remove by hand some unwanted artifacts (typically something on the order of a "page end" marker), and then parsed the result with a line or two of R. Practically, the user would likely need to copy the text into an appropriate file and then call our function in order to produce the data frame.