Princeton-CDH / derrida-django

Derrida's Margins - Python/Django web application
https://derridas-margins.princeton.edu
Apache License 2.0
8 stars 1 forks source link

As a researcher, I want a book dataset that is based on the content and doesn't include irrelevant fields from zotero so that I can focus on the data specific to this project. #262

Closed kmcelwee closed 3 years ago

kmcelwee commented 3 years ago

Dev Notes

export should include the following fields (preliminary list):

books have a work/instance structure to handle multiple different editions and translations of the same work (including at least one case where there are multiple copies of the exact same edition, for which we use the copy field to distinguish)

item type can be book, book section, or journal; a book section should belong to a book (instance) in the database, and some publication metadata should be pulled from that book record (see the zotero data for an example)

I would be open to two exports, one for works and one for instances (editions? copies? books) if you think that would simplify things any and not be too much trouble for people to work with.

Questions from Zotero code

Nope

This may have been because Zotero didn't handle journal articles

do str(creator.person), not str(creator)

Gerhardt, T. Editor Die philosophische Schriften (1890)
de Gandillac, Maurice Translator Encyclopédie (1966)
Barande, Ilse Translator Oeuvres complètes de Karl Abraham (1966)
Couturat, Louis Editor Opuscules et fragments inédits de Leibniz (1903)
Manheim, Ralph Translator Philosophie der Symbolischen Formen (1953)
Vaughan, Charles Edwyn Editor The Political Writings of Jean-Jacques Rousseau (1915)
Chaix-Ruy, Jules Translator Oeuvres choisies de Vico (1946)
Macquarrie, John Translator Being and Time (1962);Robinson, Edward Translator Being and Time (1962)
Gagnebin, Bernard Editor Dialogues (n.d.);Raymond, Marcel Editor Dialogues (n.d.)
David, Maxime Translator Dialogues sur la religion naturelle (1964)
Gibelin, Jean Translator Encyclopédie (1952)
Ruwet, Nicolas Translator Essais de linguistique générale (1963)
Kahn, Gilbert Translator Introduction à la métaphysique (1958)
Robert, Marthe Translator Journal (n.d.)
Camille, Georgette Translator L'Écriture chinoise considérée comme art poétique (1937)
Derrida, Jacques Translator L'origine de la géométrie (1962 A)
Derrida, Jacques Translator L'origine de la géométrie (1962 B)
Bianquis, Geneviève Translator La Naissance de la tragédie (1949)
Hyppolite, Jean Translator La Phénoménologie de l'esprit (1947)
Emile Chambry Translator La République (1956)
Hildenbrand, Hans Translator Le jeu comme symbole du monde (1960);Lindenberg, Alex Translator Le jeu comme symbole du monde (1960)
Gibelin, Jean Translator Leçons sur la philosophie de la religion (1832)
Bachelard, Suzanne Translator Logique formelle et logique transcendantale (1957 A)
Emile Chambry Translator Phèdre (1938)
Robin, Léon Translator Phèdre (1961)
Marc-Antoine Léonard de Malpeines Translator The Divine legation of Moses (1744)
kmcelwee commented 3 years ago

Questions

finding aid url: these almost certainly don't resolve anymore! can we get help from PUL to get catalog links for the same items?]

adapt or refer to logic in as_zotero_item method on Instance since the metadata we want to export should be similar

rlskoeser commented 3 years ago

I'm relieved the finding aid urls resolve. However, those are redirects and we should put the new urls into the dataset. They may not maintain the redirects indefinitely, and we want the data publication to be as durable as possible since we're not planning to touch this. It looks like we can just do a regex to get the new version, but let's make sure they all resolve if we do that.

I don't remember about dimensions, I guess it was never used! Agree we should drop it.

That filename doesn't sound ideal/obvious. What filename did we use for the zotero export? Can we reuse or adapt that?

Print date will never have unknown year. Please use YYYY, YYYY-MM, and YYYY-MM-DD. Sorry there isn't an existing method for this already!

We should be consistent in our empty variable filtering. Can we put it in the base data export class somewhere and use the same logic everywhere? I'd prefer to omit empty strings.

We don't have to use the zotero method, but I think it would be good to compare it with your logic. I suspect we may be missing some things — it looks like there could be instance creators other than work authors (I wondered about that but it wasn't obvious when I glanced at the code); I think there are probably some others.

rlskoeser commented 3 years ago

Book data export looks good to me.

Checked both json and csv, and looked at a variety of record types — books, book sections, journal articles; also looked for variants that were marked as translations, have insertions, etc. Also checked records with multiple authors, contributors.