DS4PS / cpp-526-sum-2020

Course shell for CPP 526 Foundations of Data Science I for Summer 2020.
http://ds4ps.org/cpp-526-sum-2020/
MIT License
2 stars 1 forks source link

Lab 01 - Question 5 #4

Open gzbib opened 4 years ago

gzbib commented 4 years ago

Hello, I was solving question 5 and I managed to use the table () function successfully. However, the results are of type character and I was wondering whether I should use a new function to search for the neighborhood with most tax parcels to return numeric result or I should manually search for the greatest number.

I am not sure if I am allowed to ask such a question but thank you in advance.

Best, Ghida

JayCastro commented 4 years ago

I actually only called the the neighborhood and looked down and scanned for the highest number but if you want just the number you could use the max function to prove its the highest number. max(table()) is what the code would look like. I hope this is the answer you're looking for. - Jacob Castro

jamisoncrawford commented 4 years ago

Thanks @JayCastro, indeed function max() is the way to go for this. A good habit with this function is to use argument na.rm = TRUE, since missing or NA values will prevent the return of maxima and instead returns NA.

Did these responses help, @gzbib?

gzbib commented 4 years ago

Thank you @JayCastro , it worked perfectly ! Thank you Sir @jamisoncrawford

I was just wondering how the logic works behind this, I mean the return value of table(dat$neighborhood) is not numeric, how does max() returns a number as an answer?

jamisoncrawford commented 4 years ago

@gzbib, sure thing!

Ostensibly, this output appears to be non-numeric, and you're technically right. We can check the class to find out with function class():

> class(table(dat$neighborhood))
[1] "table"

So class() tells us that the printed output is of class "table". What if we want to look at a specific value - in a "table" object, we can look at specific values according to position using brackets, or [ ]. For example:

Let's give it a try:

> table(dat$neighborhood)[5]
Elmwood 
   1444

When we isolate a value from this particular table, we actually get two pieces of information:

  1. The value
  2. The "label" for that value

Well, let's check the class of an isolated value, rather than the whole kit and kaboodle.

> class(table(dat$neighborhood)[5])
[1] "integer"

And Bob's you're uncle, it's of class "integer"! So it really is numeric data under the hood. The labels are just there to indicate which value is which. We can confirm this with the names() function:

> names(table(dat$neighborhood))
 [1] "Brighton"                "Court-Woodlawn"         
 [3] "Downtown"                "Eastwood"               
 [5] "Elmwood"                 "Far Westside"           
 [7] "Franklin Square"         "Hawley-Green"           
 [9] "Lakefront"               "Lincoln Hill"           
[11] "Meadowbrook"             "Near Eastside"          
[13] "Near Westside"           "North Valley"           
[15] "Northside"               "Outer Comstock"         
[17] "Park Ave."               "Prospect Hill"          
[19] "Salt Springs"            "Sedgwick"               
[21] "Skunk City"              "South Campus"           
[23] "South Valley"            "Southside"              
[25] "Southwest"               "Strathmore"             
[27] "Tipp Hill"               "University Hill"        
[29] "University Neighborhood" "Washington Square"      
[31] "Westcott"                "Winkworth"    

Does this help, @gzbib?

jamisoncrawford commented 4 years ago

@JayCastro and @gzbib, you might also be interested in another helpful function, here.

While max() returns the maximum value, which.max() returns the element position and name/label in a table. We can get the neighborhood and position number like so:

> which.max(table(dat$neighborhood))
Eastwood 
       4 

That's helpful, especially if we have 10,000 or 100,000 different neighborhoods, or all the neighborhoods in the U.S., or all the neighborhoods in the world. Trying to manually search for the neighborhood could be a real pain, so we can use the position number returned by which.max() to indicate which element we want to pull from a table by using brackets, or [ ].

> which.max(table(dat$neighborhood))
Eastwood 
       4 

> table(dat$neighborhood)[4]
Eastwood 
    4889

We can even nest the first expression inside the brackets of the second expression to get our answer in one fell swoop, although it's a little bit more dense:

> table(dat$neighborhood)[which.max(table(dat$neighborhood))]
Eastwood 
    4889 

Str8 :fire: :fire: :fire:.

gzbib commented 4 years ago

Hello Sir @jamisoncrawford ,

You explained it perfectly and you definitely answered my question :) Thank you so much.

I have a different concern for Question 6, should i start a new thread?

jamisoncrawford commented 4 years ago

@gzbib you're welcome and please do!

jamisoncrawford commented 4 years ago

P.S. let's keep this issue opened for others' benefit until Lab 01 is out of the way.