DS4PS / ays-r-coding-sum-2022

Introductory data science course in R, taught at the Andrew Young School GSU.
http://ds4ps.org/ays-r-coding-sum-2022/
2 stars 2 forks source link

Plotting non-numeric data #12

Open Cam-Hoff opened 2 years ago

Cam-Hoff commented 2 years ago

Hello, I'm starting to work on my code through and I'm having problems creating a basic bar or histogram plot to display the differences between 3 categories. im getting errors about it being non-numeric and thus not able to be plotted. like this. Error in hist.default(data$Mode_of_Shipment, main = "Shipping Methods", : 'x' must be numeric

here is my code. the variable in question has 3 possible values: Flight, Ship, Road.

hist(data$Mode_of_Shipment, main= "Shipping Methods", xlab= "Method", ylab = "number", col = "red")

I've tried a lot of ways to make it numeric (as_numeric) and used table() (which for some reason merged two of the categories together for some reason) as well to no avail.

jamisoncrawford commented 2 years ago

Run the following and see what you get:


head(data$Mode_of_Shipment)

class(data$Mode_of_Shipment)

Let's see what the data look like.

Cam-Hoff commented 2 years ago

did head with n = 20 so you can see its not all flight.

[1] "Flight" "Flight" "Flight" "Flight" "Flight" "Flight" [7] "Flight" "Flight" "Flight" "Flight" "Flight" "Flight" [13] "Flight" "Flight" "Flight" "Flight" "Flight" "Ship"
[19] "Ship" "Ship"

[1] "character"

jamisoncrawford commented 2 years ago

That's not a numeric variable, and it can't sensibly be converted to a numeric variable with as.numeric(). Remember, hist() takes a vector of numbers or numeric values and displays their frequency.

You might be more interested in creating a bar chart to show the count of each (flight, ship, etc.).

Cam-Hoff commented 2 years ago

So i had tried that earlier as well and got this error. Error in -0.01 * height : non-numeric argument to binary operator which is how i learned about the as.numeric (in searching this error code). then when applying that to the code. i get this error. Error in plot.window(xlim, ylim, log = log, ...) : need finite 'ylim' values

This is where i got stuck because most of the tutorials online talked about just replacing the vector with numeric vectors, but i thought that is what I had done with as.numeric.

example code: barplot(as.numeric(data$Mode_of_Shipment), main= "Shipping Methods", xlab= "Method", ylab = "number", col = "red")

jamisoncrawford commented 2 years ago

Nope, unfortunately Mode_of_Shipment cannot turn into numeric because there's no easy way for R to do this.

We could convert a ZIP code to numeric, or even a telephone number, because they are comprised of numbers. But in this case, things like "Ship" and "Flight" can't be converted, because they are made up of letters.

For bar plot, you are so far only providing one axis, but remember that you should provide two (for the x and y axes). I would preprocess the data beforehand, and make it so you have a list of unique shipping methods and a single count value for each method. That will convert to a visualization nicely.