CS196Illinois / Group35-FA22

1 stars 1 forks source link

Extract Text Data from COVID Massmails #9

Open RaghavSaini01 opened 1 year ago

RaghavSaini01 commented 1 year ago

One of our most prominent goals for our final visualization is a way to visualize the text patterns from the COVID massmails sent out throughout the pandemic! Let's find a way to extract data from them into text somehow so that we can work with it as needed.

A promising strategy to start with would be to take the html of one of the files, inspect it, and see if simple sanitization functions can help us get just the text as needed. Once we can do this for one, we can move to writing a script to do it for the html content from any given file and/or URL.