TheCodingCollective / Welcome

0 stars 1 forks source link

Finding patterns in text (literature, websites, ect.) #5

Open iamciera opened 10 years ago

iamciera commented 10 years ago

I already started this a bit, but I have a project that incorporates a lot of what I want to learn: web scraping, python data handling, and visualization. The end goal is to find patterns in my favorite book, Infinite Jest, but I found that the tools I was building/using could be used for any text. Another reason I want to use this book is that David Foster Wallace has an extensive math background and has eluded to a "fractal" structuring of the plot. He is meticulous and incredibly calculated, I think there could be some interesting visualization in this book. I am not alone, this book also has a history of maniac fans that try to do data analysis and visualization BY HAND! I want to learn web scraping to take all their hard work and suck it into my dataset.

Anyway, someone could help build these tools for their favorite book along side of me. We wouldn't be carving a brand new path either, this is an entire field with a lot of people we can stand on the shoulders of.

Python Functions Here is a list of functions that I want to build or have built:

Split Book by

  1. Words ✓
  2. Sentences
  3. chapters ✓
  4. Paragraphs
  5. Count occurrences of words (length = one word) ✓
  6. Track position of words ✓
  7. Count occurrences of phrases (length > one word) ✓
  8. Count occurrences of phrases
  9. Attach chronology information to chapters and position of occurrence

Web Scraping So I want to work on writing and understanding web scraping to take all their hard work and incorporate it into my dataset, for instance scraping the entire list of characters and places from this site

Visualization The last step would be visualization. I did a simple visualization of a small subset of my favorite characters in the book in ggplot, but would like to map co-occurrence and things like that using D3. I need the dataset first though. Elgh.

screen shot 2014-10-10 at 10 52 48 am

danfulop commented 10 years ago

Mind blown!! :-) ...you're freak! ...in a good way ;-) It would be super cool if you revealed a fractal pattern in this novel through scraping and data analysis.