DS4PS / cpp-526-fall-2019

Course material for CPP 526 Foundations of Data Science I
http://ds4ps.org/cpp-526-fall-2019
4 stars 4 forks source link

Lab 2 #14

Open jmacost5 opened 5 years ago

jmacost5 commented 5 years ago

I want to make sure I am doing this right. I keep getting zero or errors.

lecy commented 5 years ago

The lecture from this week introduces these "logical" operators (as opposed to the mathematical operators we discussed last week):

image

http://ds4ps.org/dp4ss-textbook/p-050-business-logic.html

I would start there. How do you operationalize "homes with values over $200k"?

jmacost5 commented 5 years ago

I guess my question with number 3 is I don't understand if I am suppose to find the ratio or where it is being built. I am not understanding if I am doing it right either. This is my code: sum(dat$land_use=="Single Family") table(dat$land_use=="Single Family", dat$yearbuilt>=1980 ) 884/24392=0.03624139 these <- dat$land_use == "Single Family" & yearbuilt>=1980 group.colors <- ifelse( these, "firebrick", "gray80" ) # don't change this plot( syr, border=NA, col=group.colors )

jmacost5 commented 5 years ago

For number 4 I tried to make a group with all of the variables and I got an error:these <- dat$land_use == "Two Family" & dat$land_use == "Apartment" & dat$land_use == "Three Family" Sum( dat$land_use== these ) group.colors <- ifelse( these, "firebrick", "gray80" )
plot( syr, border=NA, col=group.colors )

lecy commented 5 years ago

The easiest approach is to recognize the relationship between an average and a proportion when data is binary: what is the average of 0, 0, 1, 1 ?

mean( c(0,0,1,1) ) 
0.5
mean( c(0,0,1,0) ) 
0.25

When our data consists of 0's and 1's then the mean is the proportion. This is helpful in calculating group membership. If we want to know what proportion of our data belongs to the group we have defined we can use the mean function on our selector vector:

these <- dat$variable == "criteria"
mean( these )

Otherwise a proportion is always count / total, or:

sum( these ) / length( these )

Where "these" is the logical vector we are working with.

jmacost5 commented 5 years ago

For question 6 and 7 I just don't know if there is a specific way I should be typing in delinquent tax payments.

lecy commented 5 years ago

For number 4 I tried to make a group with all of the variables and I got an error:these <- dat$land_use == "Two Family" & dat$land_use == "Apartment" & dat$land_use == "Three Family"

The AND operator is the intersection of two criteria, meaning both are true at the same time:

group == "treatment" & gender == "female"

The problem is that a house cannot be two things at the same time. For example, you cannot say:

animal == "dog" & animal == "cat"

Perhaps you want the OR operator?

lecy commented 5 years ago

For question 6 and 7 I just don't know if there is a specific way I should be typing in delinquent tax payments.

You need to define your group. Delinquent means there are unpaid taxes. So you need to create a statement that identifies all of those cases to create a new group.

Specifically, you have a quantitative variable amtdelinqu which you need to translate to a logical vector. What operators work for translating a quantitative measure to a select vector?

jmacost5 commented 5 years ago

For number 4 I tried to make a group with all of the variables and I got an error:these <- dat$land_use == "Two Family" & dat$land_use == "Apartment" & dat$land_use == "Three Family"

The AND operator is the intersection of two criteria, meaning both are true at the same time:

group == "treatment" & gender == "female"

The problem is that a house cannot be two things at the same time. For example, you cannot say:

animal == "dog" & animal == "cat"

Perhaps you want the OR operator? these <- dat$land_use == "Two Family" | dat$land_use == "Apartment" | dat$land_use == "Three Family" sum( dat$land_use == these ) group.colors <- ifelse( these, "firebrick", "gray80" )
plot( syr, border=NA, col=group.colors ) I am getting a zero and I am thinking I typed something in wrong but I cannot understand what.

lecy commented 5 years ago

After your logical statement that creates your selector vector "these" you count cases (TRUEs) using sum() directly:

these <- dat$land_use == "Two Family" | 
         dat$land_use == "Apartment" | 
         dat$land_use == "Three Family"
sum( these )

Not:

sum( dat$land_use == these )

Note that these will be a logical vector.

jmacost5 commented 5 years ago

I am not understanding what I am doing wrong with number 7: these <- dat$land_use & dat$amtdelinqu == 0 table(these)

lecy commented 5 years ago

The table() function operates on two categorical variables.

table( f1, f2 )

Currently one of your variables is numeric, and one categorical. You need to convert both to categorical, THEN apply the table function.

# no criteria provided here
these <- dat$land_use  (LAND USE WHAT???)      & dat$amtdelinqu == 0

If you are trying to find delinquency for a specific type of land use then you would use an AND statement. If you are exploring rates across all types of land use you don't actually want to restrict the land use types. Try converting only the numeric variable to categorical, then using the new group with the land use group in the table() function.

these <- vector.numeric > criteria 
table( f1, these )

Also, this statement measures parcels that are NOT delinquent in tax payments:

dat$amtdelinqu == 0   # currently owe NO back taxes

To measure those that are delinquent you would either need:

dat$amtdelinqu != 0   # all cases except those that owe nothing

Or:

dat$amtdelinqu > 0   # all cases that owe something
jmacost5 commented 5 years ago

group.colors <- ifelse( these, "firebrick", "gray80" ) # don't change this plot( syr, border=NA, col=group.colors ) # don't change this Error in plot(syr, border = NA, col = group.colors) : object 'syr' not found This is the message when I try to run the map.

lecy commented 5 years ago

You need all of these chunks to load the packages and data for your lab to knit:

image

Are you using the LAB-01 template, or LAB-02 template?

castower commented 5 years ago

I guess my question with number 3 is I don't understand if I am suppose to find the ratio or where it is being built. I am not understanding if I am doing it right either. This is my code: sum(dat$land_use=="Single Family") table(dat$land_use=="Single Family", dat$yearbuilt>=1980 ) 884/24392=0.03624139 these <- dat$land_use == "Single Family" & yearbuilt>=1980 group.colors <- ifelse( these, "firebrick", "gray80" ) # don't change this plot( syr, border=NA, col=group.colors )

Hello all, I'm having a similar problem. My code is as follows:

NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980
mean(NewSingleFamily)

However, everytime I run the mean, I get an NA. Is there something that I'm missing here?

Thanks!

castower commented 5 years ago

I guess my question with number 3 is I don't understand if I am suppose to find the ratio or where it is being built. I am not understanding if I am doing it right either. This is my code: sum(dat$land_use=="Single Family") table(dat$land_use=="Single Family", dat$yearbuilt>=1980 ) 884/24392=0.03624139 these <- dat$land_use == "Single Family" & yearbuilt>=1980 group.colors <- ifelse( these, "firebrick", "gray80" ) # don't change this plot( syr, border=NA, col=group.colors )

Hello all, I'm having a similar problem. My code is as follows:

NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980
mean(NewSingleFamily)

However, everytime I run the mean, I get an NA. Is there something that I'm missing here?

Thanks!

I have found that

NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980 mean(NewSingleFamily, na.rm = TRUE) Gives a value, but I'm not sure if I should be excluding the NA values.

castower commented 5 years ago

NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980 sum(NewSingleFamily, na.rm = TRUE)/sum(dat$land_use == "Single Family")

Seems to produce the proportion of single family homes built after 1980 out of total single family homes, but I want to be sure that this is what the question is asking. Did anyone else clarify this?

Thanks!

lecy commented 5 years ago

That looks correct, in terms of what the question is asking for.

single family homes since 1980 / single family homes = proportion built since 1980
sunaynagoel commented 5 years ago

The chunk between lines 42-47 in the template provided is taking a long time to run, its been more than 10 minutes and my third attempt trying this. Without this chunk I ma not able to do mapping. anyone else ran into the same problem. This the chunk I am talking about.

# load the map files
URL <- "https://raw.githubusercontent.com/DS4PS/Data-Science-Class/master/DATA/syr_parcels.geojson"
syr <- geojson_read( URL, method="local", what="sp" )
plot( syr,  border=NA, col="gray80" )
castower commented 5 years ago

The chunk between lines 42-47 in the template provided is taking a long time to run, its been more than 10 minutes and my third attempt trying this. Without this chunk I ma not able to do mapping. anyone else ran into the same problem. This the chunk I am talking about.

# load the map files
URL <- "https://raw.githubusercontent.com/DS4PS/Data-Science-Class/master/DATA/syr_parcels.geojson"
syr <- geojson_read( URL, method="local", what="sp" )
plot( syr,  border=NA, col="gray80" )

Yes, I had the same problem. I found that clearing the environment: https://community.rstudio.com/t/how-to-clear-the-r-environment/14303 helped, but it just took a long time to run. Once it loaded once, then it was quick, but I'd suggest just letting it run while you type out the code to the other answers elsewhere (like in a notepad or word doc) and then come back when it's done.

etbartell commented 5 years ago

I'm unable to knit my chunks because of some issue related to the packages. I didn't change any of the code that was provided, so the packages should have loaded. Here is the error message:

Installing package into 'C:/Users/Elliott/Documents/R/win-library/3.6' (as 'lib' is unspecified) Quitting from lines 34-38 (Lab-02-Bartell.Rmd) Error in contrib.url(repos, "source") : trying to use CRAN without setting a mirror Calls: ... withVisible -> eval -> eval -> install.packages -> contrib.url

Execution halted

Jigarci3 commented 5 years ago

@etbartell I did notice the following on the lab instructions:

"NOTE: do not include include install package commands in your RMD chunks. Trying to install packages while knitting can cause errors."

Not sure if that is helpful at all.

lecy commented 5 years ago

Hi all - you might have trouble with that chunk of code because it is a decent-sized file (71mb). Sometimes code is slow because of the complexity of the operation, but in this case since it is just reading an external file your internet speed will be the limiting factor.

If you add the following to your code chunk header it will store a local copy and make it easier to knit in the future:

{r, cache=TRUE}

If your connection is slow you can also download the file once, add it to the same folder as your RMD document, and read it locally.

Download from here:

https://github.com/DS4PS/Data-Science-Class/blob/master/DATA/syr_parcels.geojson

And change the chunk to:

# URL <- "https://raw.githubusercontent.com/DS4PS/Data-Science-Class/master/DATA/syr_parcels.geojson"
syr <- geojson_read( "syr_parcels.geojson", method="local", what="sp" )
plot( syr,  border=NA, col="gray80" )
lecy commented 5 years ago

@etbartell Your classmate @Jigarci3 is correct, it can cause problems if you try to install packages during a knit operation. You only need to install them once and afterwards you load packages with the library() function. You will note that I included the install commands in the instructions but not in the RMD template I provided for the lab.

Were you able to get these to install?

install.packages( "geojsonio" )
install.packages( "sp" )
install.packages( "rgdal" )

The most common issue you will encounter when trying to install packages is one of the component packages might currently be in use. The best way I have found to quickly fix the problem is to shut down R Studio, open a basic R console, and run the install functions there. R Studio might have packages open to manage assets, whereas the core R console will start from scratch and typically not generate conflicts when installing. Any packages installed in the core R console will be accessible once you open R Studio again (they all use the same local library).

Let me know if this works!

etbartell commented 5 years ago

@lecy @Jigarci3 That works, thank you both!

Taesian33 commented 5 years ago

library( sp ) is resulting in nothing.

Taesian33 commented 5 years ago

library( geojsonio )

Attaching package: ‘geojsonio’

The following object is masked from ‘package:base’:

pretty

library( sp ) library( rgdal ) rgdal: version: 1.4-4, (SVN revision 833) Geospatial Data Abstraction Library extensions to R successfully loaded Loaded GDAL runtime: GDAL 2.4.2, released 2019/06/28 Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/gdal GDAL binary built with GEOS: FALSE Loaded PROJ.4 runtime: Rel. 5.2.0, September 15th, 2018, [PJ_VERSION: 520] Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/proj Linking to sp version: 1.3-1

Is that normal, what everyone sees?

lecy commented 5 years ago

Those are warning messages, and are normal.

This message was sent from a mobile device.

On Sep 5, 2019, at 4:48 PM, Taesian33 notifications@github.com wrote:

library( geojsonio )

Attaching package: ‘geojsonio’

The following object is masked from ‘package:base’:

pretty library( sp ) library( rgdal ) rgdal: version: 1.4-4, (SVN revision 833) Geospatial Data Abstraction Library extensions to R successfully loaded Loaded GDAL runtime: GDAL 2.4.2, released 2019/06/28 Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/gdal GDAL binary built with GEOS: FALSE Loaded PROJ.4 runtime: Rel. 5.2.0, September 15th, 2018, [PJ_VERSION: 520] Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/proj Linking to sp version: 1.3-1

Is that normal, what everyone sees?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Taesian33 commented 5 years ago

Struggling with WHERE questions, such as 2 and 3. (3 formula is shown above but I do not understand how we came to that conclusion for the code.)

castower commented 5 years ago

Struggling with WHERE questions, such as 2 and 3. (3 formula is shown above but I do not understand how we came to that conclusion for the code.)

Hello @Taesian33 , I understood the "where" questions to indicate a map. In other words, I answered these parts of the questions by configuring my maps to display the areas that matched the code that I typed for the first half of the questions, i.e. for question 3, I ran the code to find the ratio and then I mapped out the locations of the houses built after 1980.

castower commented 5 years ago

Struggling with WHERE questions, such as 2 and 3. (3 formula is shown above but I do not understand how we came to that conclusion for the code.)

Hello @Taesian33 : I was able to access my .rmd file and review how I answered it and I think I misunderstood your question the first time. Although I did show the maps, I also interpreted the "WHERE" part of the question to mean which neighborhoods were the location sites for the properties. Thus, I added

dat$neighborhood

as a qualifier to my table to find the locations. Hope that helps!

lecy commented 5 years ago

The intent behind the WHERE questions was to provide a visual of the group that you created to help make the idea of a logical statement and selector vector a little more tangible. So create the vector, then map it. The map is sufficient to answer the question.

Taesian33 commented 5 years ago

I always feel like I'm either skipping chapters of I'm not reading some chapters because both labs I'm coming in helpless and lost.

sunaynagoel commented 5 years ago

Struggling with WHERE questions, such as 2 and 3. (3 formula is shown above but I do not understand how we came to that conclusion for the code.)

@Taesian33 For question 3 I figured out the proportion for the first half of the question and as for the "where" part I mapped the vector which I created to store the property built after 1980. Hope this help.

Taesian33 commented 5 years ago

Struggling with WHERE questions, such as 2 and 3. (3 formula is shown above but I do not understand how we came to that conclusion for the code.)

@Taesian33 For question 3 I figured out the proportion for the first half of the question and as for the "where" part I mapped the vector which I created to store the property built after 1980. Hope this help.

Everything helps in a way, but not really because I don't know what I am doing. I just type in codes and get errors or some crazy answer and I just yell at my laptop.

sunaynagoel commented 5 years ago

Struggling with WHERE questions, such as 2 and 3. (3 formula is shown above but I do not understand how we came to that conclusion for the code.)

@Taesian33 For question 3 I figured out the proportion for the first half of the question and as for the "where" part I mapped the vector which I created to store the property built after 1980. Hope this help.

Everything helps in a way, but not really because I don't know what I am doing. I just type in codes and get errors or some crazy answer and I just yell at my laptop.

@Taesian33 We can try, I agree its confusing and takes a long time. Please post here with your code if you need any help. I am not sure if I am right or wrong but we can try to make sense.

Taesian33 commented 5 years ago

I only answered #1, and I got #3 thanks to this discussion posts, but I'm not sure how 3 got there. I would have not figured that out without reading it here, and I still do not see how it landed there.

Taesian33 commented 5 years ago

unique ( dat$land_use == "Commercial" ) [1] FALSE TRUE sum ( dat$land_use == "Commercial" ) [1] 2601 mean ( dat$land_use == "Commercial" ) [1] 0.06267168 sum ( dat$land_use & dat$neighborhood) Error in dat$land_use & dat$neighborhood : operations are possible only for numeric, logical or complex types table (dat$amtdelinqu) that <- dat$assessedva > 200000 sum (these + that) NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980 sum(NewSingleFamily, na.rm = TRUE)/sum(dat$land_use == "Single Family") [1] 0.03579042 NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980 mean(NewSingleFamily, na.rm = TRUE) [1] 0.02106914

Why?

mlgaona1717 commented 5 years ago

Hi guys,

Has anyone had an issue with the knit button not working? I can't save my file as an HTML file and I have restarted my computer and uninstalled the program as well.

sunaynagoel commented 5 years ago

@Taesian33 for this portion of your code NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980 sum(NewSingleFamily, na.rm = TRUE)/sum(dat$land_use == "Single Family") [1] 0.03579042 NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980 mean(NewSingleFamily, na.rm = TRUE) [1] 0.02106914

Mean is not the same measure as Proportion. Thats why you are getting two different answers. The mean is giving you mean of property built since 1980. I know know if we need it here or not.

sunaynagoel commented 5 years ago

Hi guys,

Has anyone had an issue with the knit button not working? I can't save my file as an HTML file and I have restarted my computer and uninstalled the program as well. @mlgaona1717 I had the same issue. I thought its not working because it was not doing anything. Then I noticed that its taking more than longer and I just had to walk away from computer and let it do its own thing. It worked for me but sounds like that may not be the issue with you.

mlgaona1717 commented 5 years ago

Hi guys, Has anyone had an issue with the knit button not working? I can't save my file as an HTML file and I have restarted my computer and uninstalled the program as well. @mlgaona1717 I had the same issue. I thought its not working because it was not doing anything. Then I noticed that its taking more than longer and I just had to walk away from computer and let it do its own thing. It worked for me but sounds like that may not be the issue with you.

Oh - that's a good point! I know it's a huge file. Thank you, I'll give that a shot!

jmacost5 commented 5 years ago

I still am getting an error when I try to run the map: this is problem 4

group.colors <- ifelse( these, "firebrick", "gray80" ) # don't change this plot( syr, border=NA, col=group.colors ) # don't change this Error in plot(syr, border = NA, col = group.colors) : object 'syr' not found

Taesian33 commented 5 years ago

Lab 02 #3 The "new housing stock" how are we supposed to figure that vector out? am I looking too much into that?

jmacost5 commented 5 years ago

table(dat$land_use == "Commercial" , dat$neighborhood) these <- as.character(dat$land_use == "Commercial",dat$neighborhood) group.colors <- ifelse( these, "firebrick", "gray80" ) # don't change this plot( syr, border=NA, col=group.colors ) # don't change this This is giving me an error

jmacost5 commented 5 years ago

I can't do an html either.

Taesian33 commented 5 years ago

THINGS ARE CLICKING! Thank you to all! @sunaynagoel @castower

castower commented 5 years ago

table(dat$land_use == "Commercial" , dat$neighborhood) these <- as.character(dat$land_use == "Commercial",dat$neighborhood) group.colors <- ifelse( these, "firebrick", "gray80" ) # don't change this plot( syr, border=NA, col=group.colors ) # don't change this This is giving me an error

You've currently got 'these' coded as the output of land_use and neighborhood in character format. Instead, we want 'these' to be coded as the properties that you want to display.

Thus, you should try to run:

these <- table ( dat$land_use == "Commercial", dat$neighborhood )
these

Once you've done this and identified the neighborhood that has the majority of commercial properties, then run following:

these <- dat$land_use == "Commercial" & dat$neighborhood == **"BLANK"**
group.colors <- ifelse( these , "firebrick", "gray80" )    
plot( syr,  border=NA, col=group.colors ) 

Where "BLANK" is the name of the neighborhood with the most commercial properties.

castower commented 5 years ago

THINGS ARE CLICKING! Thank you to all! @sunaynagoel @castower

I'm glad! We're all learning together :)

castower commented 5 years ago

I can't do an html either.

Have you tried to install the packages separately in R (not RStudio)? If so, it took quite a while for my file to knit so you might want to try to download the map and run locally using Prof Lecy's code above.

jmacost5 commented 5 years ago

Hi guys,

Has anyone had an issue with the knit button not working? I can't save my file as an HTML file and I have restarted my computer and uninstalled the program as well.

I turned off my computer and the next second my r studio was gone, I almost had a heart attack