Open cadebrown opened 2 years ago
Hi Cade,
That sounds like a great project, given that you've got the time to complete it. I had to give word choice entropy a quick google to find out what that is, but maybe go with an average of the word entropy of all the words on the page? That might be a better determinant than a sum of all the words, for example.
I think you are definitely meeting/exceeding the requirements of the project, even with just doing something like a copy/paste of Wikipedia articles. I can't say I've done it myself, but if you want to use the Wikipedia API to make web requests/crawl etc., this may not be a bad place to start.
Overall sounds like a really cool idea. Maybe try splitting it up into a copy/paste approach, and a more complicated one if you've got the time.
I was a bit confused on the idea, so maybe it could be simplified to just project gutenburg sources.
But essentially, just take multiple texts and analyze which one has a wider vocabulary
Hey Thomas,
For the project idea I was thinking of taking the word counts from multiple websites and ranking them by word choice entropy, which could be used to rank pages as "interesting" or "uninteresting", since I like expanding my vocabulary
I could start with wikipedia articles or something but I was also considering using a google search API and filtering the results so that you make a better search on top of it