DS4PS / cpp-528-fall-2020

Course shell for CPP 528 Foundations of Data Science III - Project Management
http://ds4ps.org/cpp-528-fall-2020/
1 stars 1 forks source link

Lab 05 Tutorial issues #39

Closed TVK36692 closed 3 years ago

TVK36692 commented 3 years ago

In Lab 5, I am coding through the tutorial and getting failures in the first mutate step. Error: Must subset columns with a valid subscript vector. x Can't convert from to due to loss of precision.

This is the section that is failing:

d <- select( d, 

             tractid, cbsa, cbsaname,            # ids / units of analysis

             mhv.00, mhv.10, mhv.change, mhv.growth,    # home value 

             hinc00, hu00, own00, rent00,        # ses
             hinc12, hu10, own10, rent10,

             empclf00, clf00, unemp00, prof00,   # employment 
             empclf12, clf12, unemp12, prof12,

             dpov00, npov00,                     # poverty
             dpov12, npov12,

             ag25up00, hs00, col00,              # education 
             ag25up12, hs12, col12,

             pop00.x, nhwht00, nhblk00, hisp00, asian00,   # race
             pop10, nhwht10, nhblk10, hisp10, asian10,

             num.nmtc, nmtc.total,              # tax policy data
             num.lihtc, lihtc.total             # aggregated by census tract

          ) # end select

I have been following step by step.

cenuno commented 3 years ago

@TVK36692 - this error happened to me in the past when I had a typo somewhere in my column names inside the select statement. However, it could also be the case that you might be calling a column that does not exist.

I just knitted the file myself just now and did not run into this error so I am unsure why you are receiving this error, especially since you are following it step by step.

I am using dplyr_1.0.2 but I am not sure if this is a versioning issue you are experiencing. Would you mind re-installing to upgrade to the newest version and then re-running the code?

TVK36692 commented 3 years ago

I get this error: package ‘dplyr’ successfully unpacked and MD5 sums checked Warning in install.packages : cannot remove prior installation of package ‘dplyr’ Warning in install.packages : problem copying C:\Users\tkemper\Google Drive\Classes\CPP528\LABS\cpp-528-fall-2020-group-03\renv\library\R-4.0\x86_64-w64-mingw32\00LOCK\dplyr\libs\x64\dplyr.dll to C:\Users\tkemper\Google Drive\Classes\CPP528\LABS\cpp-528-fall-2020-group-03\renv\library\R-4.0\x86_64-w64-mingw32\dplyr\libs\x64\dplyr.dll: Permission denied Warning in install.packages : restored ‘dplyr’

TVK36692 commented 3 years ago

I completely uninstalled and reinstalled the package and I still get that error: Note: Using an external vector in selections is ambiguous. i Use all_of(mhv.00) instead of mhv.00 to silence this message.

would this change the data I am getting?

cenuno commented 3 years ago

Is mhv.00 a column in d? It should be.

Right before the select statement there are a few assignment operators that create new columns that need to be run prior to the select statement that place external vectors as columns in d. Be sure to run those.

— Cristian E. Nuno


From: TVK36692 notifications@github.com Sent: Monday, November 16, 2020 1:04:36 PM To: DS4PS/cpp-528-fall-2020 cpp-528-fall-2020@noreply.github.com Cc: Cristian Ernesto Nuno cenuno@syr.edu; Comment comment@noreply.github.com Subject: Re: [DS4PS/cpp-528-fall-2020] Lab 05 Tutorial issues (#39)

I completely uninstalled and reinstalled the package and I still get that error: Note: Using an external vector in selections is ambiguous. i Use all_of(mhv.00) instead of mhv.00 to silence this message.

would this change the data I am getting?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/DS4PS/cpp-528-fall-2020/issues/39#issuecomment-728328472, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFZB2S3PCIP3BGFSM6X7A43SQGHWJANCNFSM4TXTIO6A.

cenuno commented 3 years ago

An interesting program I did not anticipate.

As a workaround, store your column names in a character vector and subset using base R notation. Here’s an example using two column names:

relevant_cols = (“tractid”, “cbsaname”)

df_subset = d[relevant_cols]

See here for more context: https://stackoverflow.com/a/45109863

— Cristian E. Nuno


From: TVK36692 notifications@github.com Sent: Monday, November 16, 2020 12:59:40 PM To: DS4PS/cpp-528-fall-2020 cpp-528-fall-2020@noreply.github.com Cc: Cristian Ernesto Nuno cenuno@syr.edu; Comment comment@noreply.github.com Subject: Re: [DS4PS/cpp-528-fall-2020] Lab 05 Tutorial issues (#39)

I get this error: package ‘dplyr’ successfully unpacked and MD5 sums checked Warning in install.packages : cannot remove prior installation of package ‘dplyr’ Warning in install.packages : problem copying C:\Users\tkemper\Google Drive\Classes\CPP528\LABS\cpp-528-fall-2020-group-03\renv\library\R-4.0\x86_64-w64-mingw32\00LOCK\dplyr\libs\x64\dplyr.dll to C:\Users\tkemper\Google Drive\Classes\CPP528\LABS\cpp-528-fall-2020-group-03\renv\library\R-4.0\x86_64-w64-mingw32\dplyr\libs\x64\dplyr.dll: Permission denied Warning in install.packages : restored ‘dplyr’

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/DS4PS/cpp-528-fall-2020/issues/39#issuecomment-728326068, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFZB2SY76DM2GLD6YD2SIYDSQGHDZANCNFSM4TXTIO6A.

TVK36692 commented 3 years ago

Yes, this is run before in its own chunk and runs with no failures: `

d <- d.full  # reset to the original dataset

# adjust 2000 home values for inflation 
mhv.00 <- d$mhmval00 * 1.28855  
mhv.10 <- d$mhmval12

# change in MHV in dollars
mhv.change <- mhv.10 - mhv.00

# drop low 2000 median home values
# to avoid unrealistic growth rates.
#
# tracts with homes that cost less than
# $10,000 are outliers
# approximately 200 out of 59,000 cases 
sum( mhv.00 < 10000 ) 

mhv.00[ mhv.00 < 10000 ] <- NA

# change in MHV in percent
mhv.growth <- 100 * ( mhv.change / mhv.00 )
summary( mhv.growth )

mhv.00[ mhv.00 < 10000 ] <- NA

# change in MHV in percent
mhv.growth <- 100 * ( mhv.change / mhv.00 )
summary( mhv.growth )
cenuno commented 3 years ago

None of that code is assigning external vectors as columns in your data frame.

If you look here: https://ds4ps.org/cpp-528-fall-2020/labs/lab-05-instructions.html#create-new-variables,

You will notice the following code is written right after that summary() statement and before the select() statement:

d$mhv.00 <- mhv.00 d$mhv.10 <- mhv.10 d$mhv.change <- mhv.change d$mhv.growth <- mhv.growth

If you have more issues, we can talk about them during office hours tonight.

— Cristian E. Nuno


From: TVK36692 notifications@github.com Sent: Monday, November 16, 2020 1:10:08 PM To: DS4PS/cpp-528-fall-2020 cpp-528-fall-2020@noreply.github.com Cc: Cristian Ernesto Nuno cenuno@syr.edu; Comment comment@noreply.github.com Subject: Re: [DS4PS/cpp-528-fall-2020] Lab 05 Tutorial issues (#39)

Yes, this is run before in its own chunk and runs with no failures: `

d <- d.full # reset to the original dataset

adjust 2000 home values for inflation

mhv.00 <- d$mhmval00 * 1.28855 mhv.10 <- d$mhmval12

change in MHV in dollars

mhv.change <- mhv.10 - mhv.00

drop low 2000 median home values

to avoid unrealistic growth rates.

#

tracts with homes that cost less than

$10,000 are outliers

approximately 200 out of 59,000 cases

sum( mhv.00 < 10000 )

mhv.00[ mhv.00 < 10000 ] <- NA

change in MHV in percent

mhv.growth <- 100 * ( mhv.change / mhv.00 ) summary( mhv.growth )

mhv.00[ mhv.00 < 10000 ] <- NA

change in MHV in percent

mhv.growth <- 100 * ( mhv.change / mhv.00 ) summary( mhv.growth )

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/DS4PS/cpp-528-fall-2020/issues/39#issuecomment-728331119, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFZB2S5EXAYSTCNJ3GJYFP3SQGILBANCNFSM4TXTIO6A.

TVK36692 commented 3 years ago

That was it. I missed that code chunk for some reason!