Princeton-CDH / geniza

version 4.x of the Princeton Geniza Project
https://geniza.princeton.edu
Apache License 2.0
11 stars 2 forks source link

As a user of the public site, I would like to download a csv of the PGP data in order to do research #951

Open kseniaryzhova opened 2 years ago

kseniaryzhova commented 2 years ago

Is your feature request related to a problem? Please describe. As we decided in the original charter, we want the public to have access to the PGP material in a csv.

Describe the solution you'd like The public csv should mirror the admin interface csv EXCLUDING the following fields: notes, needs review, URL admin, and status (because suppressed documents should automatically be unavailable to the public) and INCLUDING the following: whether or not it has a transcription and a translation (Y/N) AS WELL AS editor information for these fields.

Describe alternatives you've considered While we would be happy to have the editor information in the normal admin csv, we understand this would slow down the download too much.

kseniaryzhova commented 6 months ago

@blms @rlskoeser some use cases we came up with:

  1. Researchers who do not know us but would like to use/explore our data Examples: Researchers who don’t know us but want to know what’s in the documentary Geniza and want to spin pie charts of documents: dated documents, or language distribution, or genre distribution, or just to have the leisure to read through all the descriptions without staring at a website. Linguists who may want to look at the digitized transcriptions available through the footnote csv.

  2. Students in Princeton and non-Princeton courses (they just want to explore the data, don't have the time for the full admin interface training) Examples: Students in MR's Mamluk studies seminar last summer looking for docs to edit from the Mamluk period.

  3. People who can’t get the search to do quite what they want it to with the descriptions but can use the cab to search. People who want to run stats on who has edited what or how many have been edited or translated.