STAT325-S24 / MobyDick

Other
0 stars 2 forks source link

outline for future analysis #9

Closed arogers24 closed 8 months ago

nicholasjhorton commented 8 months ago

In class on Thursday I'll be asking everyone to briefly describe first steps and next steps for their analyses. Can you please:

  1. include your earlier info here and
  2. share any updates (commits) and
  3. describe what you plan to do over the coming week?

Thanks in advance, Nick

arogers24 commented 8 months ago

My first steps are to tackle the portions of text with character interactions. We can apply NER to find all the names and clean up the data if necessary. Since characters in Moby Dick are Captain or Mates, we will probably need to apply the pasting code we worked on in class last week. This seems like a reasonable first step for this week.

Next week, we can write something that captures the proximity of characters to one another in the text.

nicholasjhorton commented 8 months ago

The cleanNLP package may be helpful here (as it will be for other students as well) in terms of improved named entity recognition: more of this on Tuesday.

In the interim, I would encourage you to proceed with some preliminary wrangling as you discussed (e.g., search for "Captain" or "Mates")

nicholasjhorton commented 8 months ago

@arogers24 can you please share updates on where things as well as let me know if there's anything that you'd like me to look at or comment on over the weekend? As always, please let me know if you run into any issues or need assistance. Thanks in advance, Nick

nicholasjhorton commented 8 months ago

@arogers24 I'm not seeing any commits: are you all set with your next steps for your analysis and data visualizations? Please don't hesitate to reach out with questions.

arogers24 commented 8 months ago

I'm feeling a bit stuck. The things I'm looking to explore are NER and topic modeling (LDA). I took a glance at cleanNLP and am having trouble understanding how to do these. Should I try with the the previous methods or wait for class on Tuesday?

nicholasjhorton commented 8 months ago

@arogers24 sorry that you are feeling stuck. I'm hoping that today's class will help give you some ideas. Let's check in after the activity today or during my student hours at 4pm.

nicholasjhorton commented 8 months ago

Any updates on this front? I saw that you added some graphical displays (https://github.com/STAT325-S24/MobyDick/commit/1297cf2ce7176264b5f511c6a569daa23e2fc972). Is there something that you'd like me to review?

arogers24 commented 8 months ago

Yes I recently made some tables that count the number of different parts of speech by chapter number. I'm noticing those with the fewest verbs and proper nouns are the descriptive chapters I've been looking for. This seems like a viable direction for some analysis, I'm just not just how robust it is. The analysis would be pulling the text by chapter, calculating the proportions of verbs/proper nouns, and showing the content of those chapters (which describe the blubber of the whale, shape of the mast, etc.)

nicholasjhorton commented 8 months ago

@arogers24 thanks for the update: this sounds like a promising approach. Let's check in today so that we can review in person how this might proceed.

nicholasjhorton commented 8 months ago

I'm closing this in favor of #12