Lab 03- Data cleaning - Githubissues

DS4PS / cpp-527-fall-2020

http://ds4ps.org/cpp-527-fall-2020/

0 stars 1 forks source link

Lab 03- Data cleaning #13

Open krbrick opened 3 years ago

krbrick commented 3 years ago

Hey data dudes - In trying to address the first issue with pander and data cleaning, I'm trying to clean up the titles. I'm using: new.title <- gsub(pattern = "m[*]g", "", d$title)

to remove some of the tags such as:

markup--h3-strong

even using gsub("<strong class=\"markup--strong markup--h3-strong\">","", d$title)

doesn't appear to be working for me, when I cross check titles. Any pointers? Thanks,

lecy commented 3 years ago

You have quotes inside of quotes. You would need to use different quote marks to avoid conflicts:

"<strong class=\"markup--strong markup--h3-strong\">"  # current
"<strong class=\'markup--strong markup--h3-strong\'>"

krbrick commented 3 years ago

This also did not work for me, perhaps because of spaces, but I found another solution to this issue, using gsub("<strong....", "", d$titles)

lecy commented 3 years ago

That should be fine.

For a full match try this version (the double-quotes are hard-coded in the titles so put the single quotes on the outside of your pattern):

"<strong class=\"markup--strong markup--h3-strong\">"  # current
'<strong class=\"markup--strong markup--h3-strong\">'

JayCastro commented 3 years ago

Okay Im a bit confused on how to get the clap value to get the average after I find the values using grep.

I have

How <- grep("How ", d$title, value = TRUE) 
How

and I get the all values that have "How " in the sentence but i shrink d down to get only the values I want?

lecy commented 3 years ago

You want to use the grepl() function that returns a logical vector to define group structure.

You can review some of the previous labs that walk through the construction of groups using logical statements:

http://ds4ps.org/dp4ss-textbook/p-073-group-structure.html

https://ds4ps.org/cpp-526-sum-2020/labs/lab-02-instructions-v2.html

lecy commented 3 years ago

@JayCastro Did grepl() work?

JayCastro commented 3 years ago

Yes it did! Thank you, sorry I was working on other things so I never responded. It was very simple after i read the group structure walk through

lecy commented 3 years ago

Perfect, thanks.

I didn't want to give away the answer completely, but wanted to make sure my response was not too vague. Glad it came together.