JonathanReeve / data-ethics-literature-review

An automated survey of literature and curricula surrounding ethics in data science. WIP.
http://data-ethics.tech
GNU General Public License v3.0
1 stars 1 forks source link

Read ordinary (non-URL) citations from syllabi #11

Open JonathanReeve opened 3 years ago

JonathanReeve commented 3 years ago

At the moment I have a mechanism to extract all links from syllabi. But it'd be even nicer to scrape any kind of citation. For example:

Tim Wu, Machine Speech, 161 University of Pennsylvania Law Review 1495 (2013).

Some tools that might be able to help are:

JonathanReeve commented 3 years ago

Ideally we won't have to reinvent the wheel on this. But this might be a job for machine learning, or AI. I've just posted a question to StackExchange.

JonathanReeve commented 3 years ago

So Refextract is the best one I've found yet.

JonathanReeve commented 3 years ago

Refextract doesn't seem to work well for syllabi, unfortunately—it must have some logic specific to high-energy physics papers, or just to published papers in general.

However anystyle-cli seems to be working.