Have you considered incorporation exploration into the words that gets removed when you remove stop words?
It is similar to looking at the words in the stop words list (which you always should) but a more limited and reasonable approach since you are only looked at the affected words.
library(tidyverse)
library(tidytext)
library(janeaustenr)
data <- tibble(text = emma) %>%
unnest_tokens(word, text)
## This step would be added
right_join(data, stop_words, by = "word") %>%
count(word, sort = TRUE)
#> # A tibble: 728 x 2
#> word n
#> <chr> <int>
#> 1 to 15717
#> 2 the 15603
#> 3 and 14688
#> 4 of 12873
#> 5 i 9531
#> 6 a 9387
#> 7 it 7584
#> 8 her 7386
#> 9 was 7194
#> 10 she 7020
#> # ... with 718 more rows
anti_join(data, stop_words, by = "word")
#> # A tibble: 46,775 x 1
#> word
#> <chr>
#> 1 emma
#> 2 jane
#> 3 austen
#> 4 volume
#> 5 chapter
#> 6 emma
#> 7 woodhouse
#> 8 handsome
#> 9 clever
#> 10 rich
#> # ... with 46,765 more rows
Have you considered incorporation exploration into the words that gets removed when you remove stop words?
It is similar to looking at the words in the stop words list (which you always should) but a more limited and reasonable approach since you are only looked at the affected words.
Created on 2018-09-26 by the reprex package (v0.2.1)