Closed arogers24 closed 8 months ago
My first steps are to tackle the portions of text with character interactions. We can apply NER to find all the names and clean up the data if necessary. Since characters in Moby Dick are Captain or Mates, we will probably need to apply the pasting code we worked on in class last week. This seems like a reasonable first step for this week.
Next week, we can write something that captures the proximity of characters to one another in the text.
The cleanNLP
package may be helpful here (as it will be for other students as well) in terms of improved named entity recognition: more of this on Tuesday.
In the interim, I would encourage you to proceed with some preliminary wrangling as you discussed (e.g., search for "Captain" or "Mates")
@arogers24 can you please share updates on where things as well as let me know if there's anything that you'd like me to look at or comment on over the weekend? As always, please let me know if you run into any issues or need assistance. Thanks in advance, Nick
@arogers24 I'm not seeing any commits: are you all set with your next steps for your analysis and data visualizations? Please don't hesitate to reach out with questions.
I'm feeling a bit stuck. The things I'm looking to explore are NER and topic modeling (LDA). I took a glance at cleanNLP
and am having trouble understanding how to do these. Should I try with the the previous methods or wait for class on Tuesday?
@arogers24 sorry that you are feeling stuck. I'm hoping that today's class will help give you some ideas. Let's check in after the activity today or during my student hours at 4pm.
Any updates on this front? I saw that you added some graphical displays (https://github.com/STAT325-S24/MobyDick/commit/1297cf2ce7176264b5f511c6a569daa23e2fc972). Is there something that you'd like me to review?
Yes I recently made some tables that count the number of different parts of speech by chapter number. I'm noticing those with the fewest verbs and proper nouns are the descriptive chapters I've been looking for. This seems like a viable direction for some analysis, I'm just not just how robust it is. The analysis would be pulling the text by chapter, calculating the proportions of verbs/proper nouns, and showing the content of those chapters (which describe the blubber of the whale, shape of the mast, etc.)
@arogers24 thanks for the update: this sounds like a promising approach. Let's check in today so that we can review in person how this might proceed.
I'm closing this in favor of #12
In class on Thursday I'll be asking everyone to briefly describe first steps and next steps for their analyses. Can you please:
Thanks in advance, Nick