Open Sicheng2000 opened 7 months ago
Great job. Yes, the math behind these methods and algorithms can be daunting, especially in formula notation. The most important thing at this point, however, is to have a basic idea of their uses. If you continue to work with these methods, then a deeper dive into the inner workings will likely prove useful.
@francojc
count()
automatically groups and ungroups variables as needed. c.geom_smooth()
includes a linear trend line by default. d. The stopwords list is to exclude common words that may affect the final data. e. Thelemmatize_words()
function is effective in treating identical words across multiple variables as the same, yet it requires a lookup table. f. Dimensionality reduction simplifies features within a dataset, with Principal Component Analysis (PCA) being the most prevalent method.