BIOL548O / Discussion

A repository for course discussion in BIOL548O
0 stars 0 forks source link

Tidy data and significant figures #19

Open aammd opened 8 years ago

aammd commented 8 years ago

Hello there @BIOL548O/2016_students ,

Another tidy data PSA for everyone! This time: what to do when you've got a column that has lots of decimals? 4.789312893402 or something.

I've seen a few people manipulate the number of decimal places in their datasets, using round or format or some such. In general, this is a good practice (we all know the rules for the number of significant figures, and indeed most of us have taught it to undergrads)

However, in this case, correcting your sig figs is kind of a warning sign :warning:

If you find you have a column with lots and lots of decimals, in most cases this will be the result of a calculation. be wary of including calculations in your tidy data. Calculations are better left to later steps in the analysis. At this stage, in your data cleaning steps, you should focus on rearranging and organizing the data you collected.
If your Excel spreadsheet contains columns that are calculations, I advise you to drop them. (e.g. via select(-variable_name)

If you wish, write another R script (02_data_analysis.R or something) that reads your tidy data out of data/ and recalculates your dropped variables. This is a great chance to use mutate()