JonathanReeve / data-ethics-literature-review

An automated survey of literature and curricula surrounding ethics in data science. WIP.
http://data-ethics.tech
GNU General Public License v3.0
1 stars 1 forks source link

Manually extract text citations from syllabi #23

Open JonathanReeve opened 3 years ago

JonathanReeve commented 3 years ago

Let's break up the task of extracting text citations from syllabi.

I can probably use Anystyle to generate Bibtex, and then RDF, once we've manually extracted text citations from syllabi, but we should generate plain text citations from syllabi.

JonathanReeve commented 3 years ago

I can do the first 100.

Amber: 100-175.

Serena: 175-250+.

JonathanReeve commented 3 years ago

Let's create plain text files, {id}.texts.txt where {id} is the course ID.

sy2657 commented 3 years ago

Hi Jonathan, Can you tell me more details about completing this task? Or will you go over it in the next meeting on Thursday?

Thank you, Serena

On Wed, May 26, 2021 at 9:04 AM Jonathan Reeve @.***> wrote:

I can do the first 100.

Amber: 100-175.

Serena: 175-250+.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/JonathanReeve/data-ethics-literature-review/issues/23#issuecomment-848900883, or unsubscribe https://github.com/notifications/unsubscribe-auth/AONNUEPC7V4UL6OMMJFLJWDTPUL2DANCNFSM45SLPYCA .

JonathanReeve commented 3 years ago

Sure. Let's do this:

  1. Start with the ID listed above, so in your case, deCourse:175.
  2. If it has a syllabus (in HTML, PDF, .docx, etc), retrieve it.
  3. Open the syllabus, and look for a section with assigned readings.
  4. Copy the assigned readings, and paste them into a new plain text document, called {id}.texts.txt, where {id} is the course ID. So if it's deCourse:175, it would be called 175.texts.txt.
  5. Try to make sure that it's a plain text document, where each line is a reading (citation). Here's an example line: O'neil, Cathy. Weapons of math destruction: How big data increases inequality and threatens democracy. Crown, 2016.

You can do this quasi-automatically, if you like, with a little scripting. For instance:

  1. Write a SPARQL query that finds courses with hasSyllabus
  2. Try to get the syllabus HTML or PDF
  3. Verify that it's not a 404 page
  4. If it's a real syllabus, open it, so where you can manually identify the readings section
  5. Copy and paste as above

But it might be faster to do it manually.

JonathanReeve commented 3 years ago

@Zhuohan-Amber and @sy2657, could you submit pull request(s) with these changes, when you're done? And in the pull request text, just say "fixes #23," which will mark this issue as completed. Thanks in advance!

JonathanReeve commented 3 years ago

@Zhuohan-Amber and @sy2657 , let me know if you need any help with submitting pull requests on this one. It'd be nice to close the issue when that's done.

sy2657 commented 3 years ago

I thought I uploaded the texts when I submitted pull requests for Serena-branch on June 16-17 but I did not see the texts that I worked on in the data folder...

On Sat, Jul 3, 2021 at 3:41 PM Jonathan Reeve @.***> wrote:

@Zhuohan-Amber https://github.com/Zhuohan-Amber and @sy2657 https://github.com/sy2657 , let me know if you need any help with submitting pull requests on this one. It'd be nice to close the issue when that's done.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JonathanReeve/data-ethics-literature-review/issues/23#issuecomment-873479997, or unsubscribe https://github.com/notifications/unsubscribe-auth/AONNUENCS5NXGBD5CTPA55DTV6GZBANCNFSM45SLPYCA .

JonathanReeve commented 3 years ago

I don't think you submitted a pull request, since you would see it in this list of pull requests, if so. Maybe review some tutorials and try again?

sy2657 commented 3 years ago

Ok

On Sun, Jul 4, 2021 at 7:17 PM Jonathan Reeve @.***> wrote:

I don't think you submitted a pull request, since you would see it in this list of pull requests, if so https://github.com/JonathanReeve/data-ethics-literature-review/pulls?q=is%3Apr+is%3Aclosed. Maybe review some tutorials https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request and try again?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JonathanReeve/data-ethics-literature-review/issues/23#issuecomment-873733777, or unsubscribe https://github.com/notifications/unsubscribe-auth/AONNUEPUA4ECRWSG2DNLXRTTWEI4LANCNFSM45SLPYCA .