erinlynmclean / twitter-analytics

Where I'm storing all my twitter analysis files
Apache License 2.0
2 stars 3 forks source link

Sharing some figures! #1

Open kristenpeach opened 4 years ago

kristenpeach commented 4 years ago

This felt like the easiest way for me to share some wordclouds and plots. I did not want to save them as images until I can get your opinion on which ones you like and want me to tweak. I was also unsure which words you wanted me to remove. I just removed "center" and "dataoneorg".

image image image

image image

kristenpeach commented 4 years ago

https://github.com/kristenpeach/twitter-analytics/blob/master/twitter_wordclouds.Rmd

Here is a link to my Rmarkdown if you'd like to tweak things. wordcloud2() allows for a dark background and some fun additional features but it does not export to png or tiff as easily. Happy coding!

erinlynmclean commented 4 years ago

Wonderful! All I see on your markdown file, though, are the library calls...am I missing something?

erinlynmclean commented 4 years ago

These are great! One quick thing - when I run line 120 in your code / line 330 in my updated markdown docs <- tm_map(docs, removeWords, c("center", "amp", "arcticdatactr","may", "this","were","one","can","the"))

it doesn't seem to really eliminate those words from the wordcloud. Like, I have "the" right in there but then it shows up on the wordcloud. Thoughts?

(Also, new to github - sorry if this isn't the best way to debug collaboratively!)

erinlynmclean commented 4 years ago

Second question - what's the difference between lines 106-123 and lines 219-278 in your markdown doc? As I see it, they're both doing the same set of functions - cleaning the original tweet text data. Is there an advantage to using one method over the other?

erinlynmclean commented 4 years ago

A third question lolol: what in the world does set.seed(1234) do?

kristenpeach commented 4 years ago

Hi Erin,

The word cloud randomly generates the position of words, so if I want you to see the exact same thing I see every time you run the code I need to use set.seed(). It's just a reproducibility thing. You don't need it unless you want the command to produce the exact same word cloud every time.

That is weird that those lines are not removing the words you want to remove. It was working for me. Maybe assign all of your stopwords to a vector and then put that in the command instead of c()?

The biggest difference between those two chunks of lines you mentioned is that one is creating a term document matrix and one is creating a similar tidyverse object. You can do a lot of the same things with them they just have some structural differences. Probably you can achieve everything you want with the tidy tweets df. Especially if you are having an easier time cleaning up the tidy data.

If that doesnt make sense I am happy to hop on a call anytime.

Best.

Kristen Peach, PhD National Center for Ecological Analysis and Synthesis University of California 735 State Street, Suite 300 Santa Barbara, CA 93101-3351 kristenpeach1@gmail.com peach@nceas.ucsb.edu pronouns: she/her

On Wed, May 20, 2020 at 9:06 PM erinlynmclean notifications@github.com wrote:

A third question lolol: what in the world does set.seed(1234) do?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/erinlynmclean/twitter-analytics/issues/1#issuecomment-631866586, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIWF2BXJV2ZT44MBQ7NF3ULRSSSCZANCNFSM4NBEDKVQ .