DS4PS / cpp-527-fall-2020

http://ds4ps.org/cpp-527-fall-2020/
0 stars 1 forks source link

Lab03-Q1/Categories #16

Open gzbib opened 4 years ago

gzbib commented 4 years ago

Hello Sir @lecy ,

I was trying to categorize the titles such as the ones that start with "How", the ones that end with "?", and the ones that have ":" and others. However, the problem is that sometimes 2 conditions satisfy the same question type, for example, a question that begins with "How" and ends with "?".

How can I separate these types?

Thanks in advance.

lecy commented 4 years ago

You are working with logical statements to create groups.

condition.01 <- grepl( ... )
condition.02 <- grepl( ... )

Then you construct groups by defining inclusion criteria:

condition.01 & condition.02  # intersection
condition.01 | condition.02   # union
condition.01 & ! condition.02  # set difference

image

http://ds4ps.org/dp4ss-textbook/p-050-business-logic.html

lecy commented 4 years ago

However, the problem is that sometimes 2 conditions satisfy the same question type, for example, a question that begins with "How" and ends with "?". How can I separate these types?

Note that the criteria you are describing would need to be mutually exclusive in order to completely separate the groups.

Since once case can meet both, you need to operationalize your definitions (how are you identifying a question, for example) and the deciding how to separate groups into sub-groups (regular question, question with a colon then and answer, etc.).

gzbib commented 4 years ago

Thank you Sir @lecy , but do we have to separate the types in this way? or we just make sure it meets one condition?

I am trying to negate a character (?) in a grepl expression but am not finding the right syntax.

lecy commented 4 years ago

It's very hard to write compound regular expressions. You are better off writing clear regular expressions, then combining criteria using logical vectors:

x <- c( "How to achieve goals.", "How can I achieve my goals?" )

 c1 <- grepl( "^How", x )
 c2 <- grepl( "\\?", x )

c1
[1] TRUE TRUE

c2
[1] FALSE  TRUE

c1 & ! c2
[1]  TRUE FALSE
gzbib commented 4 years ago

Hello Sir,

x<- vector of clean titles

c1 <- grepl( "^How", x ) c2 <- grepl( "\?", x )

It is like I get the logic but I am not translating it to the right code maybe ? For example, what I was trying to do is below:

if (c1 & !c2){

print how-titles

}

else{

print how-titles with no questions

}

The results I got especially for the second part of the if-else statement are not right. I kept on getting how questions with question marks.

lecy commented 4 years ago

You can use the logical vector directly as your group vector for some basic analysis:

d <-
structure(list(f = c("treat", "control", "treat", "control", 
"treat", "control", "treat", "control", "treat", "control"), 
    y = c(15, 8, 21, 9, 17, 9, 13, 11, 12, 8)), class = "data.frame", row.names = c(NA, 
-10L))

         f  y
1    treat 15
2  control  8
3    treat 21
4  control  9
5    treat 17
6  control  9
7    treat 13
8  control 11
9    treat 12
10 control  8

mean( y[ f == "treat" ] )
[1] 15.6
mean( y[ f == "control" ] )
[1] 9

# DPLYR VERSION
library( dplyr )
d %>% 
   group_by( f ) %>% 
   summarize( ave=mean(y) )

  f         ave
  <chr>   <dbl>
1 control   9  
2 treat    15.6

You can also construct a distinct group from the logical vector:

group <- ifelse( logical.vector, "how to title", "regular title" )
group <- ifelse( grepl( ... ), "how to title", "regular title" )