Open adanieljohnson opened 5 years ago
@adanieljohnson great points and a good topic to discuss!
There is a function that can convert "improper" R names (e.g. spaces and invalid characters) to proper R names. It looks like the following:
names(your_data) <- make.names(names(your_data))
This function replaces all spaces with period and removes invalid characters. It is a quick trick to make proper title names.
Additionally, Karl Broman and Kara Woo wrote a neat journal article on organization of data in spreadsheets which is a great reference and located here. Both are avid R users AND biostats folks.
Best practice for naming columns in data tables is to give each column a one-word or snake_case title. This makes it easier to call in the columns as variables. I learned this applies to the code values entered in the columns too.
At our last meeting I said I was having trouble using
cor.test
to get Pearson correlations on word frequencies. I could calculate it for one part of my dataset but other subsets failed to run properly. Jerid pointed out I used text with spaces and punctuation to code values in my CSV source file and suggested re-coding to simpler one-word terms. I used Search/Replace in Excel to switch my coding terms from/to:Either extra spaces and punctuation was the problem, or I had a hidden typo, but simplifying the code terms solved the problem.