AAAI / AINews

This is the NewsFinder software, designed to automatically crawl the web for news related to artificial intelligence, filter, categorize, and rank the news, and publish to a wiki, mailing list, and RSS feeds.
http://aaai.org/AITopics/AINews
Other
54 stars 17 forks source link

Spam submissions filter? #2

Open joshuaeckroth opened 13 years ago

joshuaeckroth commented 13 years ago

Suppose somebody submits a news article via the website, and Bruce is emailed but opts not to upload the submission on the wiki. Does the AINews software respect this decision or does the software still process the submission regardless?

joshuaeckroth commented 13 years ago

From Bruce:

The articles that are submitted via the Submit Content button seem to fall into four categories: (a) true spam with no redeemable information content; (b) self-serving pointers to irrelevant articles or blogs; (c) occasional articles about AI; (d) stories I have found

The first two are the main reasons we want to review submissions before putting them on the site. About half of the small number of case (c) submissions are missing information or have not been published in legitimate publications. When I ask for the information I either get no response or (a few times a year) the information needed for me to add it to the site. With case (d) about half of the articles I submit are recent news stories, the rest are good expositions of an issue or a concept that I happen to find more than a week after their publication.

The case you hypothesize has not ever occurred, as you correctly suppose, probably because we only crawl legitimate sources and the scoring function is reasonably accurate.

So you may see the spam article in the AINews results. I doubt that this contingency has ever occurred, however.

We do have a mechanism for catching things like this, however. If one of us used the News Viewer every weekend to peruse the contents of the database of stories accumulated throughout the week prior to Monday a.m. publication, we would have a chance to mark the bad apples as irrelevant. This would keep them from being considered for publication (I believe) and would add a negative example to the training set for retraining the SVM.