BIOL548O / Discussion

A repository for course discussion in BIOL548O
0 stars 0 forks source link

Homework 2 #14

Open aammd opened 8 years ago

aammd commented 8 years ago

Hello @BIOL548O/2016_students !

Here is a description of Homework 2. I'll keep updating that website if there are any clarifications to make.

The goal of this homework assignment is to take whatever raw data you have and produce tidy data, using only an R script to do so!

You'll be graded by a peer reviewer who will read and run your code, and offer suggestions on how to improve its clarity and function.

Homework due 23rd Feb at noon.

Please ask any questions you have by replying to this issue! (ie type below)

See you next week, Andrew

aammd commented 8 years ago

@BIOL548O/2016_students, just some quick suggestions about getting data into and out of R.

The most common approach is the base function read.csv and its lower-level equivalent, read.delim. You might have to tweak the options (e.g. header = TRUE and sep = ",")

For writing data out, the base function write.csv() works fine. Pay attention to the arguments, e.g. row.names = FALSE and perhaps also quote = FALSE

An improvement on these functions is the package readr, which is a faster way of loading CSV and TSVs into R. It has better defaults and makes better guesses about the column types of your dataset. The same is true of writing functions such as readr::write_csv()

If your data is in excel, the handy package readxl is one popular way to read data from there (though there are many). It is rather slow and clunky, so get your data into a CSV quickly and try not to rely on it too much.

If your data is in another format (eg text file off an instrument) let us know by commenting on this issue!

AlexdeBruyn commented 8 years ago

Hi Andrew!

I committed my data in the form I'd converted it to online ( |Category|Count| ), rather than the spreadsheet document that I used before. I'm struggling to find a good way to work with this. Should I switch my initial 'messy' data to my .ods document format? Or do you have a recommendation for how I could turn my even messier data (again, written to form a table in html) into something tidy and parsable?

aammd commented 8 years ago

Maybe try exporting as a CSV? On Feb 20, 2016 1:33 AM, "AlexdeBruyn" notifications@github.com wrote:

Hi Andrew!

I committed my data in the form I'd converted it to online ( |Category|Count| ), rather than the spreadsheet document that I used before. I'm struggling to find a good way to work with this. Should I switch my initial 'messy' data to my .ods document format? Or do you have a recommendation for how I could turn my even messier data (again, written to form a table in html) into something tidy and parsable?

— Reply to this email directly or view it on GitHub https://github.com/BIOL548O/Discussion/issues/14#issuecomment-186554115.

AlexdeBruyn commented 8 years ago

The solution I ended up going with was that. I took the raw text, put it into my spreadsheet program (interpreting vertical bars for breaks), and then worked from that. The main cleaning that I had to do was removing some redundant rows that appeared during the conversion.