JonathanReeve / data-ethics-literature-review

An automated survey of literature and curricula surrounding ethics in data science. WIP.
http://data-ethics.tech
GNU General Public License v3.0
1 stars 1 forks source link

Retrieve the full text of each article / paper / book #14

Open JonathanReeve opened 3 years ago

JonathanReeve commented 3 years ago

To the extent that it's possible, let's try to retrieve the full text of each article, paper, or book. This will allow us to do varieties of text analysis on the full text.

We should make sure that each file is named in a regular way (probably according to their Zotero IDs, as seen in the readings/ directory).

We shouldn't check in the full text of everything into GitHub, though, so let's figure out a way to share this using Google Drive or equivalent.

xiaoshuaicui commented 3 years ago

I will try to work on this issue

JonathanReeve commented 3 years ago

Great! I imagine this could be a separate script, like enhanceGraph.py, but which goes through the graph, looking for texts, and then:

For articles:

  1. If it already has an Arxiv id, go to the Arxiv and download a copy there.
  2. If it doesn't have an Arxiv id, but it might, query Arxiv for similar-looking articles
  3. If it can't be found on the Arxiv, maybe try to get it by querying Google Scholar
  4. Maybe try the same with SciHub?
  5. Once the full text is retrieved, rename it according to the ID we have in the bibliography, and save it to papers/.
JonathanReeve commented 3 years ago

@xiaoshuaicui, I know you've left the project, but did you make any progress on this issue over those two weeks? No problem if not, but if you did, it'd be great if you could add your code to the repo.