Closed kmcelwee closed 3 years ago
finding aid url: these almost certainly don't resolve anymore! can we get help from PUL to get catalog links for the same items?]
Double checked, and get 200s for all finding aid links. These look fine to me? Here are a few:
http://findingaids.princeton.edu/collections/RBD1/c10455
http://findingaids.princeton.edu/collections/RBD1/c10455
http://findingaids.princeton.edu/collections/RBD1/c10433
http://findingaids.princeton.edu/collections/RBD1/c9157
http://findingaids.princeton.edu/collections/RBD1/c9354
http://findingaids.princeton.edu/collections/RBD1/c8361
Dimensions column is empty, work_uri
only has one value. Were these never really used?
I've used the filename derrida-instance-data.csv/json
should I call it book-data
? Something else?
Proposed unknown date format: create a function that puts the datetime into YYYY-MM-DD format, replacing question marks if the given time period is unknown. (e.g. ????-09-23
, 1945-??-??
)
I'm filtering by not in [None, '', []]
to remove nulls and empty strings from the JSON. In the PR we merged, I just did a boolean (if ref[field]
). There were no booleans in that CSV, but should I go back and fix that? Do we want empty strings removed as well or just nulls?
adapt or refer to logic in
as_zotero_item
method on Instance since the metadata we want to export should be similar
I'm relieved the finding aid urls resolve. However, those are redirects and we should put the new urls into the dataset. They may not maintain the redirects indefinitely, and we want the data publication to be as durable as possible since we're not planning to touch this. It looks like we can just do a regex to get the new version, but let's make sure they all resolve if we do that.
I don't remember about dimensions, I guess it was never used! Agree we should drop it.
That filename doesn't sound ideal/obvious. What filename did we use for the zotero export? Can we reuse or adapt that?
Print date will never have unknown year. Please use YYYY, YYYY-MM, and YYYY-MM-DD. Sorry there isn't an existing method for this already!
We should be consistent in our empty variable filtering. Can we put it in the base data export class somewhere and use the same logic everywhere? I'd prefer to omit empty strings.
We don't have to use the zotero method, but I think it would be good to compare it with your logic. I suspect we may be missing some things — it looks like there could be instance creators other than work authors (I wondered about that but it wasn't obvious when I glanced at the code); I think there are probably some others.
Book data export looks good to me.
Checked both json and csv, and looked at a variety of record types — books, book sections, journal articles; also looked for variants that were marked as translations, have insertions, etc. Also checked records with multiple authors, contributors.
Dev Notes
reference_data
inbooks
andintervention_data
ininterventions
); should extend the reference data command for reuse, similar to the way the intervention data export doesinstance
model, since that is what the reference data links to; should export every instance that is cited in a derrida work (i.e.,Instance.objects.filter(cited_in__isnull=False)
, same as the existing zotero export) — but we'll need to check this filter at some point, because according to #246 this doesn't include all the books in the reference data exportas_zotero_item
method onInstance
since the metadata we want to export should be similar; remove that method when we're done (or rename/reuse any code that's helpful)str(creator.person)
, notstr(creator)
3054e0d8d793783449748dcffa48e083634c4f24export should include the following fields (preliminary list):
books have a work/instance structure to handle multiple different editions and translations of the same work (including at least one case where there are multiple copies of the exact same edition, for which we use the copy field to distinguish)
item type can be book, book section, or journal; a book section should belong to a book (instance) in the database, and some publication metadata should be pulled from that book record (see the zotero data for an example)
I would be open to two exports, one for works and one for instances (editions? copies? books) if you think that would simplify things any and not be too much trouble for people to work with.
Questions from Zotero code
alternate_title
andprimary_title
field?contributors
to handle all non-author creators. Is that appropriate?authorized_name
(the default for__str__
) for these creators is odd, examples below. Should I just uselastname_first
instead? or would we want to preserve this information?