This repository contains code to generate a variety of bibliographic metadata formats for <tei:div>
s and <tei:biblStruct>
s. Everything is built upon <tei:biblStruct>
as an intermediate format and XPath functions. The XSLT is split into basic stylesheets for functions (file name: ...-functions.xsl
) and stylesheets applying these functions. Note that the functions make use of the oape
namespace, which is mapped to xmlns:oape="https://openarabicpe.github.io/ns"
.
<tei:biblStruct>
<tei:biblStruct>
for a <tei:div>
using information from the TEI file's <sourceDesc>
and the <tei:div>
itself as oape:bibliography-tei-div-to-biblstruct()
.<tei:div type="item">
<tei:biblStruct>
to:
<xsl:include href="https://tillgrallert.github.io/xslt-calendar-conversion/functions/date-functions.xsl"/>
.
oape:bibliography-tei-to-mods()
to convert a single <tei:biblStruct>
to <mods:mods>
oape:bibliography-tei-div-to-biblstruct()
and oape:bibliography-tei-to-mods()
to generate one MODS XML file per <tei:div>
as input.oape:bibliography-tei-div-to-biblstruct()
and oape:bibliography-tei-to-mods()
to generate one MODS XML file per TEI XML file as input with <mods:mods>
children for each <tei:div>
.oape:bibliography-tei-to-bibtex()
to convert a single <tei:biblStruct>
to BibTeXoape:bibliography-tei-div-to-biblstruct()
and oape:bibliography-tei-to-bibtex()
to generate one BibTeX file (.bib
) per <tei:div>
as input.oape:bibliography-tei-div-to-biblstruct()
and oape:bibliography-tei-to-bibtex()
to generate one BibTeX file (.bib
) per TEI XML file as input with one BibTeX child for each <tei:div>
.oape:bibliography-tei-to-yaml()
to convert a single <tei:biblStruct>
to YAMLoape:bibliography-tei-div-to-biblstruct()
and oape:bibliography-tei-to-yaml()
to generate one YAML file (.yml
) per TEI XML file as input with one YAML child for each <tei:div>
.oape:bibliography-tei-to-zotero-rdf()
to convert a single <tei:biblStruct>
to <bib:{reference-type}>
oape:bibliography-tei-div-to-biblstruct()
and oape:bibliography-tei-to-zotero-rdf()
to generate one Zotero RDF file per <tei:div>
as input.oape:bibliography-tei-div-to-biblstruct()
and oape:bibliography-tei-to-zotero-rdf()
to generate one Zotero RDF file per TEI XML file as input with <bib:{reference-type}>
children for each <tei:div>
.<tei:biblStruct>
: intermediary / exchange formatThe intermediary/exchange format between all the supported serialisations of bibliographic metadata is a TEI <biblStruct>
element.
<biblStruct>
<analytic>
<title level="a" xml:lang="ar">حكم وخواطر</title>
<author>
<persName ref="viaf:73935498 jaraid:pers:1690 oape:pers:242 wiki:Q2474371" xml:lang="ar">
<forename xml:lang="ar">شكيب</forename>
<surname xml:lang="ar">ارسلان</surname>
</persName>
</author>
<idno type="url">https://github.com/OpenArabicPE/journal_al-muqtabas/blob/master/tei/oclc_4770057679-i_14.TEIP5.xml#div_6.d1e1249</idno>
<idno type="url">https://OpenArabicPE.github.io/journal_al-muqtabas/tei/oclc_4770057679-i_14.TEIP5.xml#div_6.d1e1249</idno>
<idno type="BibTeX">oclc_4770057679-i_14-div_6.d1e1249</idno>
</analytic>
<monogr>
<title level="j" xml:lang="ar">المقتبس</title>
<title level="j" type="sub" xml:lang="ar">مجلة أدبية علمية اجتماعية تصدر ب
<placeName xml:lang="ar">القاهرة</placeName>
في غرة كل شهر عربي</title>
<title level="j" xml:lang="ar-Latn-x-ijmes">al-Muqtabas</title>
<title level="j" type="sub" xml:lang="ar-Latn-x-ijmes">Majalla adabiyya ʿilmiyya ijtimāʿiyya tuṣadir bi-l-Qāhira fī gharrat kull shahr ʿarabī</title>
<title level="j" xml:lang="fr">Al-Moktabas</title>
<title level="j" type="sub" xml:lang="fr">Revue mensuelle, littéraire, scientifique & Sociologique</title>
<idno type="OCLC" xml:lang="en">4770057679</idno>
<idno type="OCLC" xml:lang="en">79440195</idno>
<idno type="aucr" xml:lang="en">07201136864</idno>
<idno type="shamela" xml:lang="en">26523</idno>
<idno type="zenodo" xml:lang="en">45922152</idno>
<idno type="URI">oclc_4770057679-i_14</idno>
<textLang mainLang="ar"/>
<editor ref="viaf:32272677" xml:lang="en">
<persName ref="viaf:32272677 oape:pers:878 wiki:Q3123742" xml:lang="ar">
<forename xml:lang="ar">محمد</forename>
<surname xml:lang="ar">كرد علي</surname>
</persName>
<persName ref="viaf:32272677 oape:pers:878 wiki:Q3123742" xml:lang="ar-Latn-x-ijmes">
<forename xml:lang="ar-Latn-x-ijmes">Muḥammad</forename>
<surname xml:lang="ar-Latn-x-ijmes">Kurd ʿAlī</surname>
</persName>
</editor>
<imprint xml:lang="en">
<publisher xml:lang="en">
<orgName xml:lang="ar">مطبعة الظاهر</orgName>
<orgName xml:lang="ar-Latn-x-ijmes">Maṭbaʿa al-Ẓāhir</orgName>
</publisher>
<publisher xml:lang="en">
<orgName xml:lang="ar">المطبعة العمومية</orgName>
<orgName xml:lang="ar-Latn-x-ijmes">al-Maṭbaʿa al-ʿUmūmiyya</orgName>
</publisher>
<pubPlace xml:lang="en">
<placeName ref="oape:place:226 geon:360630" xml:lang="ar">القاهرة</placeName>
<placeName ref="oape:place:226 geon:360630" xml:lang="ar-Latn-x-ijmes">al-Qāhira</placeName>
<placeName ref="oape:place:226 geon:360630" xml:lang="fr">Caire</placeName>
<placeName ref="oape:place:226 geon:360630" xml:lang="en">Cairo</placeName>
</pubPlace>
<date calendar="#cal_gregorian" datingMethod="#cal_gregorian" type="official" when="1907-03-16" xml:lang="ar-Latn-x-ijmes">16 March 1907</date>
<date calendar="#cal_islamic" datingMethod="#cal_islamic" type="computed" when="1907-03-16" when-custom="1325-02-01" xml:lang="ar-Latn-x-ijmes">1 Ṣafār 1325</date>
</imprint>
<biblScope from="2" to="2" unit="volume" xml:lang="en"/>
<biblScope from="2" to="2" unit="issue" xml:lang="en"/>
<biblScope from="78" to="82" unit="page">78-82</biblScope>
</monogr>
</biblStruct>
The MODS (Metadata Object Description Schema) standard is expressed in XML and maintained by the Network Development and MARC Standards Office of the Library of Congress with input from users. Compared to BibTeX MODS has he advantage of being properly standardised, human and machine readable, and much better suited to include all the needed bibliographic information.
<mods>
<titleInfo>
<title xml:lang="ar">حكم وخواطر</title>
</titleInfo>
<typeOfResource>text</typeOfResource>
<genre authority="local" xml:lang="en">journalArticle</genre>
<genre authority="marcgt" xml:lang="en">article</genre>
<name type="personal" xml:lang="ar" valueURI="https://viaf.org/viaf/73935498">
<namePart type="family" xml:lang="ar">ارسلان</namePart>
<namePart type="given" xml:lang="ar">شكيب</namePart>
<role>
<roleTerm authority="marcrelator" type="code">aut</roleTerm>
</role>
</name>
<relatedItem type="host">
<titleInfo>
<title xml:lang="ar">المقتبس</title>
<subTitle xml:lang="ar">مجلة أدبية علمية اجتماعية تصدر بالقاهرة في غرة كل شهر عربي</subTitle>
</titleInfo>
<genre authority="marcgt">journal</genre>
<name type="personal" xml:lang="ar" valueURI="https://viaf.org/viaf/32272677">
<namePart type="family" xml:lang="ar">كرد علي</namePart>
<namePart type="given" xml:lang="ar">محمد</namePart>
<role>
<roleTerm authority="marcrelator" type="code">edt</roleTerm>
</role>
</name>
<originInfo>
<place>
<placeTerm type="text" xml:lang="ar"valueURI="https://www.geonames.org/360630">القاهرة</placeTerm>
</place>
<publisher xml:lang="ar">مطبعة الظاهر</publisher>
<publisher xml:lang="ar-Latn-x-ijmes">Maṭbaʿa al-Ẓāhir</publisher>
<publisher xml:lang="ar">المطبعة العمومية</publisher>
<publisher xml:lang="ar-Latn-x-ijmes">al-Maṭbaʿa al-ʿUmūmiyya</publisher>
<dateIssued encoding="w3cdtf">1907-03-16</dateIssued>
<dateOther calendar="islamic">1325-02-01</dateOther>
<dateOther>1325-02-01 [1907-03-16]</dateOther>
<issuance>continuing</issuance>
</originInfo>
<part>
<detail type="volume">
<number>2</number>
</detail>
<detail type="issue">
<number>2</number>
</detail>
<extent unit="pages">
<start>78</start>
<end>82</end>
</extent>
</part>
<identifier type="BibTeX">oclc_4770057679-i_14-div_6.d1e1249</identifier>
<identifier type="OCLC">4770057679</identifier>
<identifier type="OCLC">79440195</identifier>
<identifier type="aucr">07201136864</identifier>
<identifier type="shamela">26523</identifier>
<identifier type="zenodo">45922152</identifier>
<identifier type="URI">oclc_4770057679-i_14</identifier>
</relatedItem>
<accessCondition>http://creativecommons.org/licenses/by-sa/4.0/</accessCondition>
<location>
<url usage="primary display">https://github.com/OpenArabicPE/journal_al-muqtabas/blob/master/tei/oclc_4770057679-i_14.TEIP5.xml#div_6.d1e1249</url>
<url usage="primary display">https://OpenArabicPE.github.io/journal_al-muqtabas/tei/oclc_4770057679-i_14.TEIP5.xml#div_6.d1e1249</url>
</location>
<language>
<languageTerm type="code" authorityURI="http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry">ar</languageTerm>
</language>
</mods>
MODS also serves as the intermediary format for the free bibutils suite of conversions between bibliographic metadata formats (including BibTeX) which is under constant development and released under a GNU/GPL (General Public License). Tei2Mods-issues.xsl
and bibutils
provide a means to automatically generate a large number of bibliographic formats to suit the reference manager one is working with; e.g.:
$ xml2end MODS.xml > output_file.end
$ xml2bib MODS.xml > output_file.bib
Zotero has solid support for MODS import and export. However, there are a number of caveats one should be aware of:
Zotero has a limited number of "Item Types" with different fields (documentation)
Item Type | Volume | Issue | Place | contributorType: editor |
---|---|---|---|---|
Journal Article | y | y | n | y |
Magazine Article | y | y | n | n |
Newspaper Article | n | n | y | n |
<genre authority="local">journal</genre><genre authority="marcgt">journal</genre>
is mapped to "Journal Article" and the journal title will end up as article title with the journal title empty.<title xml:lang="ar">الجنان</title><title xml:lang="ar-Latn-x-ijmes">al-Jinān</title>
, Zotero will always pick the first entry.BibTeX is a plain text format which has been around for more than 30 years and which is widely supported by reference managers. Thus it seems to be a safe bet to preserve and exchange minimal bibliographic data.
There are, however, a number of problems with the format:
[^1]:Wikipedia has a better description than the official website.
[^2]: Preferably validating against the OpenArabicPE schema. All conversion functions work with any <tei:div>
as input but concrete implementation of conversions is dependent on @type
attribute values.
@ARTICLE{oclc_4770057679-i_14-div_6.d1e1249,
author = {ارسلان, شكيب},
editor = {كرد علي, محمد},
title = {حكم وخواطر},
journal = {المقتبس: مجلة أدبية علمية اجتماعية تصدر بالقاهرة في غرة كل شهر عربي},
volume = {2},
number = {2},
pages = {78-82},
publisher = {مطبعة الظاهر},
publisher = {المطبعة العمومية},
address = {القاهرة},
language = {ar},
day = {16},
month = {3},
year = {1907},
url = {https://github.com/OpenArabicPE/journal_al-muqtabas/blob/master/tei/oclc_4770057679-i_14.TEIP5.xml#div_6.d1e1249},
annote = {digital TEI edition, 2021},
}
A basic conversion to YAML was built by mapping the <tei:biblStruct>
input to fields using this example, which basically mirrors [CSL JSON]() and should work with [Pandoc]() using the pandoc-citeproc filter.
- id: 'oclc_4770057679-i_14-div_6.d1e1249'
title: ' حكم وخواطر'
container-title: 'المقتبس: مجلة أدبية علمية اجتماعية تصدر بالقاهرةفي غرة كل شهر عربي'
volume: '2'
issue: '2'
page: '78-82'
URL:
- 'https://github.com/OpenArabicPE/journal_al-muqtabas/blob/master/tei/oclc_4770057679-i_14.TEIP5.xml#div_6.d1e1249'
- 'https://OpenArabicPE.github.io/journal_al-muqtabas/tei/oclc_4770057679-i_14.TEIP5.xml#div_6.d1e1249'
OCLC:
- '4770057679'
- '79440195'
author:
- family: 'ارسلان'
given: 'شكيب'
editor:
- family: 'كرد علي'
given: 'محمد'
language: ar
type:
issued: '1907-03-16'
<bib:Article rdf:about="#oclc_4770057679-i_14-div_6.d1e1249">
<z:itemType>magazineArticle</z:itemType>
<dcterms:isPartOf>
<bib:Periodical>
<prism:volume>2</prism:volume>
<prism:number>2</prism:number>
<dc:title>المقتبس: مجلة أدبية علمية اجتماعية تصدر بالقاهرة في غرة كل شهر عربي</dc:title>
</bib:Periodical>
</dcterms:isPartOf>
<dc:title>حكم وخواطر</dc:title>
<z:shortTitle>حكم وخواطر</z:shortTitle>
<bib:authors>
<rdf:Seq>
<rdf:li>
<foaf:Person>
<foaf:surname>ارسلان</foaf:surname>
<foaf:givenName>شكيب</foaf:givenName>
</foaf:Person>
</rdf:li>
</rdf:Seq>
</bib:authors>
<bib:editors>
<rdf:Seq>
<rdf:li>
<foaf:Person>
<foaf:surname>كرد علي</foaf:surname>
<foaf:givenName>محمد</foaf:givenName>
</foaf:Person>
</rdf:li>
</rdf:Seq>
</bib:editors>
<dc:publisher>
<foaf:Organization>
<vcard:adr>
<vcard:Address>
<vcard:locality>القاهرة</vcard:locality>
</vcard:Address>
</vcard:adr>
<foaf:name>مطبعة الظاهر المطبعة العمومية</foaf:name>
</foaf:Organization>
</dc:publisher>
<dc:identifier>
<dcterms:URI>
<rdf:value>https://github.com/OpenArabicPE/journal_al-muqtabas/blob/master/tei/oclc_4770057679-i_14.TEIP5.xml#div_6.d1e1249</rdf:value>
</dcterms:URI>
</dc:identifier>
<dc:identifier>
<dcterms:URI>
<rdf:value>https://OpenArabicPE.github.io/journal_al-muqtabas/tei/oclc_4770057679-i_14.TEIP5.xml#div_6.d1e1249</rdf:value>
</dcterms:URI>
</dc:identifier>
<bib:pages>78-82</bib:pages>
<dc:date>1907-03-16</dc:date>
<dc:date>1907-03-16</dc:date>
<dc:description>Citation Key: oclc_4770057679-i_14-div_6.d1e1249
BibTeX Cite Key: oclc_4770057679-i_14-div_6.d1e1249
date_hijri: 1325-02-01
oclc: 4770057679
zenodo: 45922152
place: القاهرة
publisher: مطبعة الظاهر
publisher: المطبعة العمومية
</dc:description>
<z:language>ar</z:language>
</bib:Article>
The proprietary JSON to directly communicate with the Zotero database / servers through an API has a number of advantages:
direct writing access to all fields
full text can be written to notes to provide a simple full-text search
procedure
json-to-xml()
andxml-to-json()
oXygen has a built-in toolchain: JSON to XML and XML to JSON
{
"data": {
"DOI": "",
"ISSN": "",
"abstractNote": "",
"accessDate": "",
"archive": "",
"archiveLocation": "",
"callNumber": "",
"collections": "9FLQJQ88",
"creators": [
{
"creatorType": "editor",
"firstName": "أنستاس ماري",
"lastName": "الكرملي"
},
{
"creatorType": "editor",
"firstName": "كاظم",
"lastName": "الدجيلي"
}
],
"date": "1913-11-01",
"dateAdded": "2019-11-28T10:26:57Z",
"dateModified": "2019-11-28T10:26:57Z",
"extra": "",
"issue": 4,
"itemType": "journalArticle",
"journalAbbreviation": "",
"key": "ABTFWQ5G",
"language": "",
"libraryCatalog": "",
"pages": "216-219",
"publicationTitle": "لغة العرب: مجلة شهرية ادبية علمية تاريخية",
"relations": "",
"rights": "",
"series": "",
"seriesText": "",
"seriesTitle": "",
"shortTitle": "",
"title": "باب المشارفة والانتقاد: ٧ - تاريخ الصحافة العربية",
"url": "https://openarabicpe.github.io/journal_lughat-al-arab/tei/oclc_472450345-i_28.TEIP5.xml#div_11.d2e3751",
"version": 3238,
"volume": 3
},
"key": "ABTFWQ5G",
"library": {
"id": 904125,
"links": {
"alternate": {
"href": "https://www.zotero.org/groups/openarabicpe",
"type": "text/html"
}
},
"name": "OpenArabicPE",
"type": "group"
},
"links": {
"alternate": {
"href": "https://www.zotero.org/groups/openarabicpe/items/ABTFWQ5G",
"type": "text/html"
},
"self": {
"href": "https://api.zotero.org/groups/904125/items/ABTFWQ5G",
"type": "application/json"
}
},
"meta": {
"createdByUser": {
"id": 2028652,
"links": {
"alternate": {
"href": "https://www.zotero.org/till.grallert",
"type": "text/html"
}
},
"name": "Till Grallert",
"username": "till.grallert"
},
"creatorSummary": "الكرملي and الدجيلي",
"numChildren": 0,
"parsedDate": "1913-11-01"
},
"version": 3238
}
<array>
<key>ABTFWQ5G</key>
<version>3238</version>
<library>
<type>group</type>
<id>904125</id>
<name>OpenArabicPE</name>
<links>
<alternate>
<href>https://www.zotero.org/groups/openarabicpe</href>
<type>text/html</type>
</alternate>
</links>
</library>
<links>
<self>
<href>https://api.zotero.org/groups/904125/items/ABTFWQ5G</href>
<type>application/json</type>
</self>
<alternate>
<href>https://www.zotero.org/groups/openarabicpe/items/ABTFWQ5G</href>
<type>text/html</type>
</alternate>
</links>
<meta>
<createdByUser>
<id>2028652</id>
<username>till.grallert</username>
<name>Till Grallert</name>
<links>
<alternate>
<href>https://www.zotero.org/till.grallert</href>
<type>text/html</type>
</alternate>
</links>
</createdByUser>
<creatorSummary>الكرملي and الدجيلي</creatorSummary>
<parsedDate>1913-11-01</parsedDate>
<numChildren>0</numChildren>
</meta>
<data>
<key>ABTFWQ5G</key>
<version>3238</version>
<itemType>journalArticle</itemType>
<title>باب المشارفة والانتقاد: ٧ - تاريخ الصحافة العربية</title>
<creators>
<creatorType>editor</creatorType>
<firstName>أنستاس ماري</firstName>
<lastName>الكرملي</lastName>
</creators>
<creators>
<creatorType>editor</creatorType>
<firstName>كاظم</firstName>
<lastName>الدجيلي</lastName>
</creators>
<abstractNote></abstractNote>
<publicationTitle>لغة العرب: مجلة شهرية ادبية علمية تاريخية</publicationTitle>
<volume>3</volume>
<issue>4</issue>
<pages>216-219</pages>
<date>1913-11-01</date>
<series></series>
<seriesTitle></seriesTitle>
<seriesText></seriesText>
<journalAbbreviation></journalAbbreviation>
<language></language>
<DOI></DOI>
<ISSN></ISSN>
<shortTitle></shortTitle>
<url>https://openarabicpe.github.io/journal_lughat-al-arab/tei/oclc_472450345-i_28.TEIP5.xml#div_11.d2e3751</url>
<accessDate></accessDate>
<archive></archive>
<archiveLocation></archiveLocation>
<libraryCatalog></libraryCatalog>
<callNumber></callNumber>
<rights></rights>
<extra></extra>
<collections>9FLQJQ88</collections>
<relations></relations>
<dateAdded>2019-11-28T10:26:57Z</dateAdded>
<dateModified>2019-11-28T10:26:57Z</dateModified>
</data>
</array>
My primary interest is in moving reference data from Sente to Zotero. For this, we need a custom transformation from Sente XML to something Zotero can import. While I am most familiar with MODS, it seems that CSL JSON is the more complete format (quite a few fields are missing from the MODS im- and export).