Watts-College / cpp-527-fall-2021

A course shell for CPP 527 Foundations of Data Science II
https://watts-college.github.io/cpp-527-fall-2021/
2 stars 6 forks source link

Lab 3 - Q1- Comparing Title Styles #14

Open AhmedRashwanASU opened 2 years ago

AhmedRashwanASU commented 2 years ago

Using the data provided in Lab 3 as below

URL <- "https://raw.githubusercontent.com/DS4PS/cpp-527-fall-2020/master/labs/data/medium-data-utf8-v2.csv"
d <- read.csv( URL )
d 

below code produce a logical expression that matches all the titles with mentioned expressions, so how the same can be analyzed and compared to each expression?


category.members <- grepl("How to", d$title)

category. members
lecy commented 2 years ago

I'm not sure if I completely understand your question.

But you have constructed a new group. We can do lots with groups.

group <- logical statement 
sum( group )  # number of members 
mean( group )  # proportion of total

d$group <- group

d %>% 
  group_by( group ) %>% 
  summary( ave=mean( x ) )

Note, you want to check your titles to make sure they don't include cases that don't begin with "how to" - you might refine your expression a little more with regular expression anchors.

AhmedRashwanASU commented 2 years ago

Yup this is helpful thanks, Prof, one more question

I was trying to find a pattern in the Power Lists titles that can help to gather more accurate results any idea what can be the pattern, can be a specific word or expression?

Power Lists: “Six Recommendations for Aspiring Data Scientists” “13 Essential Newsletters for Data Scientists: Remastered” “7 Machine Learning lessons that stuck with me this year

lecy commented 2 years ago

Power lists are usually short lists of the most important ideas or tools they recommend.

Always start by writing down pseudocode! What is the recipe here?

Lists that start with a small set of numbers then are followed by text.

Anything else?

Then work to translate that to a regular expression search term.

AhmedRashwanASU commented 2 years ago

Hero !! yes, that makes sense.

RoseMurphy237 commented 2 years ago

@lecy I'm not making the jump here. I'm able to get the Power list of those with numbers in the title, but i'm stuck on getting both character numbers and numerical values for the Power list.

lecy commented 2 years ago

This might help you out: https://github.com/Watts-College/cpp-527-fall-2021/issues/24#issuecomment-916539015

It might be easier to user multiple clean regular expressions, then combine the results using logical operators rather than trying to write one complicated expression.