DS4PS / cpp-526-spr-2021

Course shell for Foundations of Data Science I
https://ds4ps.org/cpp-526-spr-2021/
MIT License
1 stars 2 forks source link

Lab 2 , Q7 - Q8 #2

Open AhmedRashwanASU opened 3 years ago

AhmedRashwanASU commented 3 years ago

Not Sure if I'm solving this in a proper way, is there any tips that can help to indicate more accurate results?

Question I: What proportion of commercial properties are delinquent on taxes?

Question II: What proportion of delinquent tax bills are owed by commercial parcels?

Use function: 'mean()' Use variable: 'amtdelinqt' Use variable: 'landuse'

The first answer is tax-delinquent commercial properties over all commercial properties

(Answer )

proportion <- mean(downtown$amtdelinqt > 0 & downtown$landuse == "Commercial" ,na.rm = TRUE )

proportion*100 [1] 6.426735

(Answer )

The second answer is the tax dollars owed by commercial properties (a subset) over all tax dollars owed

sum(downtown$amtdelinqt > 0 & downtown$landuse == "Commercial" ,na.rm = TRUE ) [1] 25

**Question 8: Tax Delinquent Parcels by Land Use

Question: How many of each land use type are delinquent on taxes? Print a table of your results.**

Use function: 'table()' Use variable: 'amtdelinqt' Use variable: 'landuse'

(Answer )

table(downtown$landuse , downtown$amtdelinqt > 0 )

                                      FALSE    TRUE

Apartment 6 0 Commercial 184 25 Community Services 15 2 Industrial 2 2 Parking 62 16 Parks 8 0 Recreation 5 0 Religious 6 0 Schools 4 0 Single Family 1 0 Utilities 6 0 Vacant Land 33 12

jamisoncrawford commented 3 years ago

Hi @AhmedRashwanASU - so the first sub-question here is really asking for a proportion of a subset of the data. You've got this hot mess of tax parcels in your dataset, but you're really looking for a subset of commercial-only parcels.

Commercial Properties & Stuff

Let's look at what you're doing now:

proportion <- mean(downtown$amtdelinqt > 0 & downtown$landuse == "Commercial" ,na.rm = TRUE )

proportion*100
[1] 6.426735

While this is pretty fire for early in your R coding career, you're not quite getting the right proportion, and that's because you're looking at a proportion out of all properties instead of commercial properties. That's because downtown here is checked against your conditions == "Commercial" and > 0, and downtown contains everything!

Let's get weird. Let's say we want to create an entirely new dataset, and this will be a subset of the downtown dataset. How? Well, the same way we use assignment and set up conditional statements.

com_props <- downtown[downtown$landuse == "Commercial", ]

Recall that these brackets [ ] are powerful notation for subsetting data. Left of the comma ([ here , ]) are the rows you want to keep - in this case, all rows where landuse equals "Commercial". To the right of the comma ([ , here]) are the columns you want to keep. And we want to keep them all, so we leave it blank.

We've stored this in object com_props. Now, you can find which rows/observations in com_props are tax delinquent and use mean() or some other such method to get to the right answer - you do the same for the second part of this question, as well!

Tables & Crosstabs

As for printing a table as output - you've got the right method though you could get a bit more precise.

The following is a bit more advanced, and you'll get more accustomed to this throughout the course, so don't feel you need to be able to do this now!

Say we wanted to pull crosstabs for mtcars based on cyl (cylinder type) and mpg greater than 28 (miles per gallon).

table(mtcars$cyl, mtcars$mpg > 28)

    FALSE TRUE
  4     7    4
  6     7    0
  8    14    0

More or less the same thing - well, you could cast this table() output as a data.frame like so:

data.frame(table(mtcars$cyl, mtcars$mpg > 28))

  Var1  Var2 Freq
1    4 FALSE    7
2    6 FALSE    7
3    8 FALSE   14
4    4  TRUE    4
5    6  TRUE    0
6    8  TRUE    0

Far out.

Let's name it for convenience.

Now we can use the same trick we did with the above subsetting (using brackets) to say what rows and columns we want to keep. Remember that in [ , ], rows you want to keep are indicated to the left of the comma and columns are to the right.

car_tabs <- data.frame(table(mtcars$cyl, mtcars$mpg > 28))

car_tabs[car_tabs$Var2 == TRUE, ]

  Var1 Var2 Freq
4    4 TRUE    4
5    6 TRUE    0
6    8 TRUE    0

Here, we're only keeping rows where the second variable is equal to TRUE. Let's get rid of that column now that we don't need it.

car_tabs2 <- car_tabs[car_tabs$Var2 == TRUE, ]

car_tabs2[ , c(1, 3)]

  Var1 Freq
4    4    4
5    6    0
6    8    0

There we have it - a bit of a neater table. We could just go nuts for the hell of it.

colnames(car_tabs3) <- c("Cylinders", "Count")

rownames(car_tabs3) <- NULL

car_tabs3

  Cylinders Count
1         4     4
2         6     0
3         8     0

Hope this helps!

sjone128 commented 3 years ago

The explanation and expansion for creating a table were incredibly useful! I was able to follow along and use it in my assignment until the very end. I'm not going to include it in my assignment but I enjoyed the practice!

AhmedRashwanASU commented 3 years ago

@jamisoncrawford Thank you for the detailed Explanation for this Topic, hope that I didn't mess the correct way to find the final answers

The first answer is tax-delinquent commercial properties overall commercial properties

Tax_Delinquent_Commercial <- downtown[downtown$landuse == "Commercial", ]

proportion <- mean(Tax_Delinquent_Commercial$amtdelinqt > 0 ,na.rm = TRUE )

proportion*100

[1] 11.96172

The second answer is the tax dollars owed by commercial properties (a subset) overall tax dollars owed

Tax_dollars_Commercial <- downtown[downtown$landuse == "Commercial", ]

proportion_Tax <- sum(Tax_dollars_Commercial$amtdelinqt ,na.rm = TRUE )

sum_downtown <-sum(downtown$amtdelinqt, na.rm = TRUE)

proportion_Tax/sum_downtown*100

[1] 86.95747

jamisoncrawford commented 3 years ago

@sjone128 super glad you read this and found it helpful!

@AhmedRashwanASU you got it! But I was trying to be somewhat vague as to not give away the answer 🤣. No worries - if folks read the thread, they will learn nearly just as well, I think.