Closed dmbates closed 10 years ago
My goal in changing the case of names was to push for much higher consistency across datasets, but I'm happy to revert that decision if it's awkward.
If it's not too much trouble for you, I would like it if we were to still ensure that (1) no names are ever invalid Julia identifiers and (2) no names ever contain obvious misspellings.
On thinking more about this I believe that consistency within the RDatasets package is more desirable than is consistency with the names in R which, as I mentioned, are not at all consistent.
I agree that the checking should include scanning for '.' embedded in a name - one of the unfortunate consequences of the age of the S language specification. (Originally "" was an assignment operator, interchangeable with '<-' in S and in R, because on some Teletype machines the "" was a left-pointing arrow. That convention still lives on in ESS where typing a single '' creates ' <- '. Because '' was in use, the '.' was used as a separator in names. Then came the convention of using '.' in a function name to indicate an S3 method.)
Ok. If you're happy with consistency, then I think we can leave the current changes in place. If you find anywhere where we haven't consistently to make every column name into a valid name that uses initial-cap camelcase, please open an issue. There are a few datasets whose column names where sufficiently unclear to me that I didn't know how to fix them.
Really interesting to hear the history of the .
convention in R.
Late to the party, but I concur with leaving the new naming format; I went through the Gadfly documentation and fixed every example for the new format, so I'd prefer it didn't change again :)
I think the format should be stable, but there's a few data sets left in there that don't fully match the format yet.
I think the names are pretty consistent at this point, but definitely open an issue or PR if you come across any inconsistencies.
Is it intentional that the capitalization of variables' names differs from that in the R data sets?
I'm currently revising Bates and Watts (1988), Nonlinear Regression Analysis and Its Applications, including examples in R and in Julia, Admittedly the capitalization of the variable names in R is wildly inconsistent and the capitalization in the RDatasets package is more consistent but it still becomes awkward explaining why the formulas are different in the two versions of an example.
For example, in R
whereas in Julia,