Watts-College / cpp-527-fall-2021

A course shell for CPP 527 Foundations of Data Science II
https://watts-college.github.io/cpp-527-fall-2021/
2 stars 6 forks source link

Lab 04 - Part 2 #34

Open sandralili opened 3 years ago

sandralili commented 3 years ago

Good Evening Dr. @lecy,

I have this message when I use the "stem" function.

'stem' is deprecated; use dfm_wordstem() 
instead   provid      educ communiti     organ   support   mission     youth    promot   program    purpos 
    21103     17352     15061     13294     11233      8515      7468      7335      6774      6671 

It does work, but it looks like it is not an ideal function anymore. I attempted to use the new function "dfm_wordsteam()" but somehow I am not using it correctly. Any idea what I am doing wrong?

my.tokens <- tokens
my.tokens <- dfm (my.tokens)
my.tokens <- dfm_wordstem ( my.tokens)
 head(my.tokens)

docs    meet physic spiritu need god peopl abil expand help local
  text1    1      1       1    1   1     1    1      1    1     1
  text2    0      0       0    0   0     0    0      0    0     0
  text3    0      0       0    0   0     0    0      0    0     0
  text4    0      0       0    0   0     0    0      0    0     0
  text5    0      0       0    0   0     0    0      0    0     0
  text6    0      0       0    0   0     0    0      0    0     0
[ reached max_nfeat ... 24,986 more features ]

Thanks in advance!

lecy commented 3 years ago

when I use the "stem" function. 'stem' is deprecated; use dfm_wordstem()

Can you please provide an example? It is working fine on my computer, so I would need more context.

mtwelker commented 3 years ago

I get the same warning message in the "R Markdown" window when I knit:

Output created: Lab-04-WelkerM.html
Warning messages:
1: package 'quanteda' was built under R version 4.1.1 
2: package 'quanteda.textmodels' was built under R version 4.1.1 
3: package 'quanteda.textstats' was built under R version 4.1.1 
4: package 'quanteda.textplots' was built under R version 4.1.1 
5: text_field argument is not used. 
6: 'stem' is deprecated; use dfm_wordstem() instead 

But it seems to work okay, so I just ignored it.

sandralili commented 3 years ago

Dr. @lecy , yes of course, these are the outcomes:

Tabulate top word counts


tokens %>% dfm ( stem=F ) %>% topfeatures( )

'stem' is deprecated; use dfm_wordstem() instead provid educ communiti organ support mission youth promot program purpos 21104 17376 15078 13301 11240 8517 7493 7337 6777 6674


Stemming


tokens %>% dfm ( stem=T ) %>% topfeatures( )

'stem' is deprecated; use dfm_wordstem() instead provid educ communiti organ support mission youth promot program purpos 21104 17376 15078 13301 11240 8517 7493 7337 6777 6674

image

sandralili commented 3 years ago

thanks @mtwelker , I don't get those warnings, how weird.

lecy commented 3 years ago

It looks like the original still works, but they are probably not supporting it.

If you look at the help file for dfm_wordstem() you see that there are different options:

https://quanteda.io/reference/tokens_wordstem.html

You are creating a "Document Frequency Matrix" here:

dfm_wordstem ( my.tokens )

docs    meet physic spiritu need god peopl abil expand help local
  text1    1      1       1    1   1     1    1      1    1     1
  text2    0      0       0    0   0     0    0      0    0     0
  text3    0      0       0    0   0     0    0      0    0     0
  text4    0      0       0    0   0     0    0      0    0     0
  text5    0      0       0    0   0     0    0      0    0     0
  text6    0      0       0    0   0     0    0      0    0     0
[ reached max_nfeat ... 24,986 more features ]

I suspect this version would get you the equivalent of what you have now:

tokens_wordstem( my.tokens )

You can add the arguments "warning=FALSE, message=FALSE" to the chunk to suppress messages if you just want to silence those and use the current function.

```{r, warning=FALSE, message=FALSE}
sandralili commented 3 years ago

That sounds good Dr. @lecy, I will try it thank you!