I already started this a bit, but I have a project that incorporates a lot of what I want to learn: web scraping, python data handling, and visualization. The end goal is to find patterns in my favorite book, Infinite Jest, but I found that the tools I was building/using could be used for any text. Another reason I want to use this book is that David Foster Wallace has an extensive math background and has eluded to a "fractal" structuring of the plot. He is meticulous and incredibly calculated, I think there could be some interesting visualization in this book. I am not alone, this book also has a history of maniac fans that try to do data analysis and visualization BY HAND! I want to learn web scraping to take all their hard work and suck it into my dataset.
Anyway, someone could help build these tools for their favorite book along side of me. We wouldn't be carving a brand new path either, this is an entire field with a lot of people we can stand on the shoulders of.
Python Functions
Here is a list of functions that I want to build or have built:
Split Book by
Words ✓
Sentences
chapters ✓
Paragraphs
Count occurrences of words (length = one word) ✓
Track position of words ✓
Count occurrences of phrases (length > one word) ✓
Count occurrences of phrases
Attach chronology information to chapters and position of occurrence
Web Scraping
So I want to work on writing and understanding web scraping to take all their hard work and incorporate it into my dataset, for instance scraping the entire list of characters and places from this site
Visualization
The last step would be visualization. I did a simple visualization of a small subset of my favorite characters in the book in ggplot, but would like to map co-occurrence and things like that using D3. I need the dataset first though. Elgh.
Mind blown!! :-) ...you're freak! ...in a good way ;-) It would be super cool if you revealed a fractal pattern in this novel through scraping and data analysis.
I already started this a bit, but I have a project that incorporates a lot of what I want to learn: web scraping, python data handling, and visualization. The end goal is to find patterns in my favorite book, Infinite Jest, but I found that the tools I was building/using could be used for any text. Another reason I want to use this book is that David Foster Wallace has an extensive math background and has eluded to a "fractal" structuring of the plot. He is meticulous and incredibly calculated, I think there could be some interesting visualization in this book. I am not alone, this book also has a history of maniac fans that try to do data analysis and visualization BY HAND! I want to learn web scraping to take all their hard work and suck it into my dataset.
Anyway, someone could help build these tools for their favorite book along side of me. We wouldn't be carving a brand new path either, this is an entire field with a lot of people we can stand on the shoulders of.
Python Functions Here is a list of functions that I want to build or have built:
Split Book by
Web Scraping So I want to work on writing and understanding web scraping to take all their hard work and incorporate it into my dataset, for instance scraping the entire list of characters and places from this site
Visualization The last step would be visualization. I did a simple visualization of a small subset of my favorite characters in the book in ggplot, but would like to map co-occurrence and things like that using D3. I need the dataset first though. Elgh.