Open jmacost5 opened 5 years ago
The lecture from this week introduces these "logical" operators (as opposed to the mathematical operators we discussed last week):
http://ds4ps.org/dp4ss-textbook/p-050-business-logic.html
I would start there. How do you operationalize "homes with values over $200k"?
I guess my question with number 3 is I don't understand if I am suppose to find the ratio or where it is being built. I am not understanding if I am doing it right either. This is my code: sum(dat$land_use=="Single Family") table(dat$land_use=="Single Family", dat$yearbuilt>=1980 ) 884/24392=0.03624139 these <- dat$land_use == "Single Family" & yearbuilt>=1980 group.colors <- ifelse( these, "firebrick", "gray80" ) # don't change this plot( syr, border=NA, col=group.colors )
For number 4 I tried to make a group with all of the variables and I got an error:these <- dat$land_use == "Two Family" & dat$land_use == "Apartment" & dat$land_use == "Three Family"
Sum( dat$land_use== these )
group.colors <- ifelse( these, "firebrick", "gray80" )
plot( syr, border=NA, col=group.colors )
The easiest approach is to recognize the relationship between an average and a proportion when data is binary: what is the average of 0, 0, 1, 1 ?
mean( c(0,0,1,1) )
0.5
mean( c(0,0,1,0) )
0.25
When our data consists of 0's and 1's then the mean is the proportion. This is helpful in calculating group membership. If we want to know what proportion of our data belongs to the group we have defined we can use the mean function on our selector vector:
these <- dat$variable == "criteria"
mean( these )
Otherwise a proportion is always count / total, or:
sum( these ) / length( these )
Where "these" is the logical vector we are working with.
For question 6 and 7 I just don't know if there is a specific way I should be typing in delinquent tax payments.
For number 4 I tried to make a group with all of the variables and I got an error:these <- dat$land_use == "Two Family" & dat$land_use == "Apartment" & dat$land_use == "Three Family"
The AND operator is the intersection of two criteria, meaning both are true at the same time:
group == "treatment" & gender == "female"
The problem is that a house cannot be two things at the same time. For example, you cannot say:
animal == "dog" & animal == "cat"
Perhaps you want the OR operator?
For question 6 and 7 I just don't know if there is a specific way I should be typing in delinquent tax payments.
You need to define your group. Delinquent means there are unpaid taxes. So you need to create a statement that identifies all of those cases to create a new group.
Specifically, you have a quantitative variable amtdelinqu which you need to translate to a logical vector. What operators work for translating a quantitative measure to a select vector?
For number 4 I tried to make a group with all of the variables and I got an error:these <- dat$land_use == "Two Family" & dat$land_use == "Apartment" & dat$land_use == "Three Family"
The AND operator is the intersection of two criteria, meaning both are true at the same time:
group == "treatment" & gender == "female"
The problem is that a house cannot be two things at the same time. For example, you cannot say:
animal == "dog" & animal == "cat"
Perhaps you want the OR operator? these <- dat$land_use == "Two Family" | dat$land_use == "Apartment" | dat$land_use == "Three Family" sum( dat$land_use == these ) group.colors <- ifelse( these, "firebrick", "gray80" )
plot( syr, border=NA, col=group.colors ) I am getting a zero and I am thinking I typed something in wrong but I cannot understand what.
After your logical statement that creates your selector vector "these" you count cases (TRUEs) using sum()
directly:
these <- dat$land_use == "Two Family" |
dat$land_use == "Apartment" |
dat$land_use == "Three Family"
sum( these )
Not:
sum( dat$land_use == these )
Note that these
will be a logical vector.
I am not understanding what I am doing wrong with number 7: these <- dat$land_use & dat$amtdelinqu == 0 table(these)
The table()
function operates on two categorical variables.
table( f1, f2 )
Currently one of your variables is numeric, and one categorical. You need to convert both to categorical, THEN apply the table function.
# no criteria provided here
these <- dat$land_use (LAND USE WHAT???) & dat$amtdelinqu == 0
If you are trying to find delinquency for a specific type of land use then you would use an AND statement. If you are exploring rates across all types of land use you don't actually want to restrict the land use types. Try converting only the numeric variable to categorical, then using the new group with the land use group in the table()
function.
these <- vector.numeric > criteria
table( f1, these )
Also, this statement measures parcels that are NOT delinquent in tax payments:
dat$amtdelinqu == 0 # currently owe NO back taxes
To measure those that are delinquent you would either need:
dat$amtdelinqu != 0 # all cases except those that owe nothing
Or:
dat$amtdelinqu > 0 # all cases that owe something
group.colors <- ifelse( these, "firebrick", "gray80" ) # don't change this plot( syr, border=NA, col=group.colors ) # don't change this Error in plot(syr, border = NA, col = group.colors) : object 'syr' not found This is the message when I try to run the map.
You need all of these chunks to load the packages and data for your lab to knit:
Are you using the LAB-01 template, or LAB-02 template?
I guess my question with number 3 is I don't understand if I am suppose to find the ratio or where it is being built. I am not understanding if I am doing it right either. This is my code: sum(dat$land_use=="Single Family") table(dat$land_use=="Single Family", dat$yearbuilt>=1980 ) 884/24392=0.03624139 these <- dat$land_use == "Single Family" & yearbuilt>=1980 group.colors <- ifelse( these, "firebrick", "gray80" ) # don't change this plot( syr, border=NA, col=group.colors )
Hello all, I'm having a similar problem. My code is as follows:
NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980
mean(NewSingleFamily)
However, everytime I run the mean, I get an NA. Is there something that I'm missing here?
Thanks!
I guess my question with number 3 is I don't understand if I am suppose to find the ratio or where it is being built. I am not understanding if I am doing it right either. This is my code: sum(dat$land_use=="Single Family") table(dat$land_use=="Single Family", dat$yearbuilt>=1980 ) 884/24392=0.03624139 these <- dat$land_use == "Single Family" & yearbuilt>=1980 group.colors <- ifelse( these, "firebrick", "gray80" ) # don't change this plot( syr, border=NA, col=group.colors )
Hello all, I'm having a similar problem. My code is as follows:
NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980 mean(NewSingleFamily)
However, everytime I run the mean, I get an NA. Is there something that I'm missing here?
Thanks!
I have found that
NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980 mean(NewSingleFamily, na.rm = TRUE) Gives a value, but I'm not sure if I should be excluding the NA values.
NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980 sum(NewSingleFamily, na.rm = TRUE)/sum(dat$land_use == "Single Family")
Seems to produce the proportion of single family homes built after 1980 out of total single family homes, but I want to be sure that this is what the question is asking. Did anyone else clarify this?
Thanks!
That looks correct, in terms of what the question is asking for.
single family homes since 1980 / single family homes = proportion built since 1980
The chunk between lines 42-47 in the template provided is taking a long time to run, its been more than 10 minutes and my third attempt trying this. Without this chunk I ma not able to do mapping. anyone else ran into the same problem. This the chunk I am talking about.
# load the map files
URL <- "https://raw.githubusercontent.com/DS4PS/Data-Science-Class/master/DATA/syr_parcels.geojson"
syr <- geojson_read( URL, method="local", what="sp" )
plot( syr, border=NA, col="gray80" )
The chunk between lines 42-47 in the template provided is taking a long time to run, its been more than 10 minutes and my third attempt trying this. Without this chunk I ma not able to do mapping. anyone else ran into the same problem. This the chunk I am talking about.
# load the map files URL <- "https://raw.githubusercontent.com/DS4PS/Data-Science-Class/master/DATA/syr_parcels.geojson" syr <- geojson_read( URL, method="local", what="sp" ) plot( syr, border=NA, col="gray80" )
Yes, I had the same problem. I found that clearing the environment: https://community.rstudio.com/t/how-to-clear-the-r-environment/14303 helped, but it just took a long time to run. Once it loaded once, then it was quick, but I'd suggest just letting it run while you type out the code to the other answers elsewhere (like in a notepad or word doc) and then come back when it's done.
I'm unable to knit my chunks because of some issue related to the packages. I didn't change any of the code that was provided, so the packages should have loaded. Here is the error message:
Installing package into 'C:/Users/Elliott/Documents/R/win-library/3.6'
(as 'lib' is unspecified)
Quitting from lines 34-38 (Lab-02-Bartell.Rmd)
Error in contrib.url(repos, "source") :
trying to use CRAN without setting a mirror
Calls:
Execution halted
@etbartell I did notice the following on the lab instructions:
"NOTE: do not include include install package commands in your RMD chunks. Trying to install packages while knitting can cause errors."
Not sure if that is helpful at all.
Hi all - you might have trouble with that chunk of code because it is a decent-sized file (71mb). Sometimes code is slow because of the complexity of the operation, but in this case since it is just reading an external file your internet speed will be the limiting factor.
If you add the following to your code chunk header it will store a local copy and make it easier to knit in the future:
{r, cache=TRUE}
If your connection is slow you can also download the file once, add it to the same folder as your RMD document, and read it locally.
Download from here:
https://github.com/DS4PS/Data-Science-Class/blob/master/DATA/syr_parcels.geojson
And change the chunk to:
# URL <- "https://raw.githubusercontent.com/DS4PS/Data-Science-Class/master/DATA/syr_parcels.geojson"
syr <- geojson_read( "syr_parcels.geojson", method="local", what="sp" )
plot( syr, border=NA, col="gray80" )
@etbartell Your classmate @Jigarci3 is correct, it can cause problems if you try to install packages during a knit operation. You only need to install them once and afterwards you load packages with the library()
function. You will note that I included the install commands in the instructions but not in the RMD template I provided for the lab.
Were you able to get these to install?
install.packages( "geojsonio" )
install.packages( "sp" )
install.packages( "rgdal" )
The most common issue you will encounter when trying to install packages is one of the component packages might currently be in use. The best way I have found to quickly fix the problem is to shut down R Studio, open a basic R console, and run the install functions there. R Studio might have packages open to manage assets, whereas the core R console will start from scratch and typically not generate conflicts when installing. Any packages installed in the core R console will be accessible once you open R Studio again (they all use the same local library).
Let me know if this works!
@lecy @Jigarci3 That works, thank you both!
library( sp ) is resulting in nothing.
library( geojsonio )
Attaching package: ‘geojsonio’
The following object is masked from ‘package:base’:
pretty
library( sp ) library( rgdal ) rgdal: version: 1.4-4, (SVN revision 833) Geospatial Data Abstraction Library extensions to R successfully loaded Loaded GDAL runtime: GDAL 2.4.2, released 2019/06/28 Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/gdal GDAL binary built with GEOS: FALSE Loaded PROJ.4 runtime: Rel. 5.2.0, September 15th, 2018, [PJ_VERSION: 520] Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/proj Linking to sp version: 1.3-1
Is that normal, what everyone sees?
Those are warning messages, and are normal.
This message was sent from a mobile device.
On Sep 5, 2019, at 4:48 PM, Taesian33 notifications@github.com wrote:
library( geojsonio )
Attaching package: ‘geojsonio’
The following object is masked from ‘package:base’:
pretty library( sp ) library( rgdal ) rgdal: version: 1.4-4, (SVN revision 833) Geospatial Data Abstraction Library extensions to R successfully loaded Loaded GDAL runtime: GDAL 2.4.2, released 2019/06/28 Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/gdal GDAL binary built with GEOS: FALSE Loaded PROJ.4 runtime: Rel. 5.2.0, September 15th, 2018, [PJ_VERSION: 520] Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.6/Resources/library/rgdal/proj Linking to sp version: 1.3-1
Is that normal, what everyone sees?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Struggling with WHERE questions, such as 2 and 3. (3 formula is shown above but I do not understand how we came to that conclusion for the code.)
Struggling with WHERE questions, such as 2 and 3. (3 formula is shown above but I do not understand how we came to that conclusion for the code.)
Hello @Taesian33 , I understood the "where" questions to indicate a map. In other words, I answered these parts of the questions by configuring my maps to display the areas that matched the code that I typed for the first half of the questions, i.e. for question 3, I ran the code to find the ratio and then I mapped out the locations of the houses built after 1980.
Struggling with WHERE questions, such as 2 and 3. (3 formula is shown above but I do not understand how we came to that conclusion for the code.)
Hello @Taesian33 : I was able to access my .rmd file and review how I answered it and I think I misunderstood your question the first time. Although I did show the maps, I also interpreted the "WHERE" part of the question to mean which neighborhoods were the location sites for the properties. Thus, I added
dat$neighborhood
as a qualifier to my table to find the locations. Hope that helps!
The intent behind the WHERE questions was to provide a visual of the group that you created to help make the idea of a logical statement and selector vector a little more tangible. So create the vector, then map it. The map is sufficient to answer the question.
I always feel like I'm either skipping chapters of I'm not reading some chapters because both labs I'm coming in helpless and lost.
Struggling with WHERE questions, such as 2 and 3. (3 formula is shown above but I do not understand how we came to that conclusion for the code.)
@Taesian33 For question 3 I figured out the proportion for the first half of the question and as for the "where" part I mapped the vector which I created to store the property built after 1980. Hope this help.
Struggling with WHERE questions, such as 2 and 3. (3 formula is shown above but I do not understand how we came to that conclusion for the code.)
@Taesian33 For question 3 I figured out the proportion for the first half of the question and as for the "where" part I mapped the vector which I created to store the property built after 1980. Hope this help.
Everything helps in a way, but not really because I don't know what I am doing. I just type in codes and get errors or some crazy answer and I just yell at my laptop.
Struggling with WHERE questions, such as 2 and 3. (3 formula is shown above but I do not understand how we came to that conclusion for the code.)
@Taesian33 For question 3 I figured out the proportion for the first half of the question and as for the "where" part I mapped the vector which I created to store the property built after 1980. Hope this help.
Everything helps in a way, but not really because I don't know what I am doing. I just type in codes and get errors or some crazy answer and I just yell at my laptop.
@Taesian33 We can try, I agree its confusing and takes a long time. Please post here with your code if you need any help. I am not sure if I am right or wrong but we can try to make sense.
I only answered #1, and I got #3 thanks to this discussion posts, but I'm not sure how 3 got there. I would have not figured that out without reading it here, and I still do not see how it landed there.
unique ( dat$land_use == "Commercial" ) [1] FALSE TRUE sum ( dat$land_use == "Commercial" ) [1] 2601 mean ( dat$land_use == "Commercial" ) [1] 0.06267168 sum ( dat$land_use & dat$neighborhood) Error in dat$land_use & dat$neighborhood : operations are possible only for numeric, logical or complex types table (dat$amtdelinqu) that <- dat$assessedva > 200000 sum (these + that) NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980 sum(NewSingleFamily, na.rm = TRUE)/sum(dat$land_use == "Single Family") [1] 0.03579042 NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980 mean(NewSingleFamily, na.rm = TRUE) [1] 0.02106914
Why?
Hi guys,
Has anyone had an issue with the knit button not working? I can't save my file as an HTML file and I have restarted my computer and uninstalled the program as well.
@Taesian33 for this portion of your code NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980 sum(NewSingleFamily, na.rm = TRUE)/sum(dat$land_use == "Single Family") [1] 0.03579042 NewSingleFamily <- dat$land_use == "Single Family" & dat$yearbuilt > 1980 mean(NewSingleFamily, na.rm = TRUE) [1] 0.02106914
Mean is not the same measure as Proportion. Thats why you are getting two different answers. The mean is giving you mean of property built since 1980. I know know if we need it here or not.
Hi guys,
Has anyone had an issue with the knit button not working? I can't save my file as an HTML file and I have restarted my computer and uninstalled the program as well. @mlgaona1717 I had the same issue. I thought its not working because it was not doing anything. Then I noticed that its taking more than longer and I just had to walk away from computer and let it do its own thing. It worked for me but sounds like that may not be the issue with you.
Hi guys, Has anyone had an issue with the knit button not working? I can't save my file as an HTML file and I have restarted my computer and uninstalled the program as well. @mlgaona1717 I had the same issue. I thought its not working because it was not doing anything. Then I noticed that its taking more than longer and I just had to walk away from computer and let it do its own thing. It worked for me but sounds like that may not be the issue with you.
Oh - that's a good point! I know it's a huge file. Thank you, I'll give that a shot!
I still am getting an error when I try to run the map: this is problem 4
group.colors <- ifelse( these, "firebrick", "gray80" ) # don't change this plot( syr, border=NA, col=group.colors ) # don't change this Error in plot(syr, border = NA, col = group.colors) : object 'syr' not found
Lab 02 #3 The "new housing stock" how are we supposed to figure that vector out? am I looking too much into that?
table(dat$land_use == "Commercial" , dat$neighborhood) these <- as.character(dat$land_use == "Commercial",dat$neighborhood) group.colors <- ifelse( these, "firebrick", "gray80" ) # don't change this plot( syr, border=NA, col=group.colors ) # don't change this This is giving me an error
I can't do an html either.
THINGS ARE CLICKING! Thank you to all! @sunaynagoel @castower
table(dat$land_use == "Commercial" , dat$neighborhood) these <- as.character(dat$land_use == "Commercial",dat$neighborhood) group.colors <- ifelse( these, "firebrick", "gray80" ) # don't change this plot( syr, border=NA, col=group.colors ) # don't change this This is giving me an error
You've currently got 'these' coded as the output of land_use and neighborhood in character format. Instead, we want 'these' to be coded as the properties that you want to display.
Thus, you should try to run:
these <- table ( dat$land_use == "Commercial", dat$neighborhood )
these
Once you've done this and identified the neighborhood that has the majority of commercial properties, then run following:
these <- dat$land_use == "Commercial" & dat$neighborhood == **"BLANK"**
group.colors <- ifelse( these , "firebrick", "gray80" )
plot( syr, border=NA, col=group.colors )
Where "BLANK" is the name of the neighborhood with the most commercial properties.
THINGS ARE CLICKING! Thank you to all! @sunaynagoel @castower
I'm glad! We're all learning together :)
I can't do an html either.
Have you tried to install the packages separately in R (not RStudio)? If so, it took quite a while for my file to knit so you might want to try to download the map and run locally using Prof Lecy's code above.
Hi guys,
Has anyone had an issue with the knit button not working? I can't save my file as an HTML file and I have restarted my computer and uninstalled the program as well.
I turned off my computer and the next second my r studio was gone, I almost had a heart attack
I want to make sure I am doing this right. I keep getting zero or errors.