JonathanReeve / data-ethics-literature-review

An automated survey of literature and curricula surrounding ethics in data science. WIP.
http://data-ethics.tech
GNU General Public License v3.0
1 stars 1 forks source link

Find a way to separate high-quality and low-quality bibliographic entries #12

Open JonathanReeve opened 3 years ago

JonathanReeve commented 3 years ago

The tools that I wrote extract lots of URLs from syllabi, but they aren't doing a great job at guessing which URLs point to papers, articles, and other useful things, and so we're getting lots of readings that are actually just links to something like facebook.com/like-button or equivalent.

The Zotero translator does an OK job at trying to figure out which ones are legit, but we still need a way to separate high from low-quality bibliographic entries. This sounds like a job for a document categorizer?