Watts-College / cpp-527-fall-2021

A course shell for CPP 527 Foundations of Data Science II
https://watts-college.github.io/cpp-527-fall-2021/
2 stars 6 forks source link

Step 3 No Gender Data #71

Open millmeli42 opened 2 years ago

millmeli42 commented 2 years ago

Hello Professor-

I have successfully loaded the gender data into R -

Please see below:

Warning in untar2(tarfile, files, list, exdir, restore_times) :
  skipping pax global extended headers
* installing *source* package 'genderdata' ...
** using staged installation
** R
** data
*** moving datasets to lazyload DB
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
  converting help for package 'genderdata'
    finding HTML links ... done
    ipums_usa                               html  
    kantrowitz                              html  
    napp                                    html  
    ssa_national                            html  
    ssa_state                               html  
** building package indices
** testing if installed package can be loaded from temporary location
*** arch - i386
*** arch - x64
** testing if installed package can be loaded from final location
*** arch - i386
*** arch - x64
** testing if installed package keeps a record of temporary installation path
* DONE (genderdata)

However, In step 3 I cannot populate any gender data? I have names in my first.names. I have names in unique names, but when I run the following, I do not get what is in the lab. This is prior to merging the databases.

library(gender)
# need to save first names for the merge 
d$first.name <- get_first_name( name=d$Full.Name )
#head(d$first.name)

# create a list of unique first names in the data 
unique.first.names <- unique( d$first.name )
#head(unique.first.names)

# get gender data from US Social Security Admin  **### This is the step that is not working**
gen <- gender( unique.first.names )
head( gen ) %>% pander()

It shows 0 observations with 6 variables for gen.


name proportion_male proportion_female gender


Table: Table continues below


year_min year_max


Thanks!

lecy commented 2 years ago

Have you created this function already?

get_first_name( name=d$Full.Name )
millmeli42 commented 2 years ago

Yes I have.

head(d$first.names)

[1] " Mohammad" " Jose Maria Reynaldo Apollo" [3] " Kelsea" " Enyah"
[5] " Precious" " James"

millmeli42 commented 2 years ago

My d dataframe has 7 variables and 12,520 observations. My gen data frame has 0 observations and 6 variables. I know the problem is there.

My unique.first.names has 3,647 names

first.names has 12,520 observations

lecy commented 2 years ago

Do you get errors at this step then?

library( gender )
gen <- gender( unique.first.names )
millmeli42 commented 2 years ago

No error, it runs, there are just no observations and 6 variables in the gen dataframe

library(gender) gen <- gender(unique.first.names) head(gen) %>% pander

name proportion_male proportion_female gender year_min year_max No data available in table

lecy commented 2 years ago

I can’t tell what’s happening from the code you provided because it’s unclear whether your unique.first.names is empty or the gender() function is not working.

If it’s the gender function I would close down R Studio and try installing the gender packages fresh.

kidistbetter105 commented 2 years ago

Do you get errors at this step then?

library( gender )
gen <- gender( unique.first.nam

I have the same issue and I am having error when I run the code. Am stack on step 3