Thomas-Neuefeind / MiniProject1

0 stars 0 forks source link

MiniProject1 Idea #1

Open cadebrown opened 2 years ago

cadebrown commented 2 years ago

Hey Thomas,

For the project idea I was thinking of taking the word counts from multiple websites and ranking them by word choice entropy, which could be used to rank pages as "interesting" or "uninteresting", since I like expanding my vocabulary

I could start with wikipedia articles or something but I was also considering using a google search API and filtering the results so that you make a better search on top of it

Thomas-Neuefeind commented 2 years ago

Hi Cade,

That sounds like a great project, given that you've got the time to complete it. I had to give word choice entropy a quick google to find out what that is, but maybe go with an average of the word entropy of all the words on the page? That might be a better determinant than a sum of all the words, for example.

I think you are definitely meeting/exceeding the requirements of the project, even with just doing something like a copy/paste of Wikipedia articles. I can't say I've done it myself, but if you want to use the Wikipedia API to make web requests/crawl etc., this may not be a bad place to start.

Overall sounds like a really cool idea. Maybe try splitting it up into a copy/paste approach, and a more complicated one if you've got the time.

cadebrown commented 2 years ago

I was a bit confused on the idea, so maybe it could be simplified to just project gutenburg sources.

But essentially, just take multiple texts and analyze which one has a wider vocabulary