Closed baskaufs closed 1 year ago
Cliff says it would be good to periodically check with Olivia K-F to let her know what we are doing. They are going to adopt a public database system that might be better than a publicly available Zotero website. But it would be relatively easy for us to turn on the Zotero dataset. Andy can ping Olivia to let her know what's going on.
From 2022-09-18 email from Charlotte:
The properties we should include in the Wikidata item for a journal article are listed below. · instance of · Title/language of work and name Add a language tag with the title after the title is entered. This property is correlated with the language of work and name. If the article is written in multiple languages, list the languages with the language of the work and name. · author(s)/author name string Use QID in the property author(s). If there is no QID, create a new item for the author or use the author name string. Include series ordinal as a qualifier if the article is co-authored. · published in · publication date, volume, issue, page(s), and the number of pages.
On the identifier page, list the DOI or JSTOR ID if available.
Most of the articles populated in Zotero don’t have the data for the language of work and the number of pages. Could you compose a script to filter through the titles to figure out the language of the work? I wonder if you can also write a script to calculate the number of pages from the pages (displayed in both Roman and Arabic numerals).
From Chris Benda 2022-09-27:
For journal articles, is the journal title listed or is the ISSN also available?
In the small sample I looked at, which includes records from EBSCOhost, JSTOR, and ProQuest, both are listed. I would have said, without looking at actual records, that the ISSN may or may not be there while the journal title almost always is, but things seem even better than that.
In the output from Zotero, I’m assuming the journal and the containing book would have different field names. (You said they would probably be two different dumps, right?)
The attached csv file, which includes books, book chapters, and journal articles, shows the following title mappings:
Book records: book title -> Title field
Book section records: section title -> Title field book title -> Publication Title field
Journal article records: article title -> Title field journal title -> Publication Title field
Additional info from Charlotte 2022-09-27:
For journal articles, is the journal title listed or is the ISSN also available?
In the small sample I looked at, which includes records from EBSCOhost, JSTOR, and ProQuest, both are listed. I would have said, without looking at actual records, that the ISSN may or may not be there while the journal title almost always is, but things seem even better than that.
I tried to include the ISSN or E-ISSN when I created new Wikidata items for journals in which the Div. publications were published two years ago. I can’t guarantee that all journals have the ISSN, but most do.
Alright, after examining the sample spreadsheets, it looks like we can assume that the ISSN is there for journal articles, since most have them. Perhaps as a pre-processing step we can look up/add this to records that don't have it. Many of the records missing it also don't have a journal title, so if we don't have that, either we probably should just not create the item, or create the item without the journal (published_in) statement.
Remaining issues:
Experiment with extracting book metadata from book chapter records and see if there's enough info for creating items.
Place of Publication needs Q IDs
Make sure to save the "key" value in the VanderBot table as an ignored column.
Decided to skip part of series. Too complicated and too few cases. Handle manually or as a followup processing step
Decided to not yet try to automatically add books for book chapters that can't be linked to a containing book.
In 2022-05-02 meeting: