Open don-han opened 8 years ago
How would Naive Bayes help us filter webpages? I'm having a hard time seeing how websites could be fit to a conditional probability model.
It's like building a spam filter with Naive Bayes. Since Naive Bayes is a text classifier, we can use it to classify "integration services" like Jenkins and normal webpages. My assumption is that since integration services have their own jargons such as "idle", "workers", and "builders", our classifier should easily distinguish integration services vs. normal pages.
Also, the reason why I chose Naive Bayes over other classification methods is that Naive Bayes is super-fast, and given that we are processing hundreds of thousands of web pages, we can't afford to run slow algorithms. It definitely is not a perfect algorithm, but I am thinking of adding redundancy to improve the false positives and false negatives
Mm, all right. Thanks! I don't think I truly understand what Bayes is for, but hopefully taking 188 next semester will help. :P I'll just tag along with you as you code, and I'll try to contribute where I can.
Temporary measure by blacklisting Jenkins implemented: 28170f8b5d4c458fed9892928cd202f385999f18