TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
269 stars 88 forks source link

Guidelines need a 'Cite This' or similar button on every page. #2350

Open jamescummings opened 1 year ago

jamescummings commented 1 year ago

The TEI Guidelines generated pages (all of them) need a 'Cite This' button for citing not only the Guidelines but Spec pages etc. with generated URL that one happens to be looking at. There are a variety of JS plugins which produce modal popups with a selection of standard citation forms. They tend to convert from forms like BibTeX so it would be fairly straightforward to produce a standard citation with current version number / release number and add the URL for the current page, embedding that in the page with the javascript linked.

sydb commented 1 year ago

NB #2336 and #2137.

joeytakeda commented 1 month ago

FWIW, I'm still in favour of having an "Cite this page" button on the Guidelines pages, but I also noticed that the GL's don't play nicely with the Zotero at the moment (due mostly to the lack of metadata in the headers of the GL pages).

So I ended up writing a quick Zotero translator for the Guidelines pages: https://github.com/joeytakeda/zotero-translators/blob/teiguidelines/TEI%20Guidelines.js — this makes it so that anyone using the Zotero connector in Chrome/Firefox etc can simply add any Guidelines page to their Zotero library and, at the very least, gets a reasonable set of metadata for that page (where page is an instance of an HTML page and not subsections etc)

I relied on the Licensing and Citation page, but still made a few guesses so suggestions welcome!

  1. At the moment, this translator treats a guidelines page as a "book chapter" (and the GLs as a book) since those seemed the best item types for accommodating the bibliographic metadata the TEI-C recommends using in a citation.
  2. Only software allows a "version number" (but I don't think of the Guidelines as "software"), so the version of the Guidelines becomes the "edition" of the "container," which seems right to me (Edit: And the Licensing page corroborates that as exemplifies the use of bibl/edition to tag a TEI version number)
  3. The URL for the record is whatever URL one is saving (e.g. https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-person.html); however, if one is saving the current release, the translator also attaches the Vault URL for that page (e.g. https://www.tei-c.org/Vault/P5/4.7.0/doc/tei-p5-doc/en/html/ref-person.html). That seemed more honest (rather than forcing the Vault link as the URL)
  4. The title for the record is the main heading for the page (e.g. "12 Critical Apparatus"; "<person>"). The Licensing and Citation page does not give any examples for elements or specification pages, but I wasn't sure if the ref docs ought to use glosses when they can (e.g. "<lb> (line beginning)" or something)

If it's useful, here's what the JS looks like (which shows the properties and how they're defined)


async function doWeb(doc, url) {
    const type = detectWeb(doc, url);
    const GL_TITLE = "TEI P5: Guidelines for Electronic Text Encoding and Interchange";
    if (!(type == "bookSection" || type == "book")) {
        return null;
    }
    const item = parseMetadata(new Z.Item(type), doc, url);
    if (type == "book") {
        item.title = GL_TITLE;
    }
    if (type == "bookSection") {
        item.title = text(doc, '.main-content > h2') || text(doc, '.main-content h3.oddSpec');
        item.bookTitle = GL_TITLE;
    }
    await item.complete();
    return item;
}

function parseMetadata(item, doc, url) {
    const footer = doc.querySelector('.stdfooter');
    // Version info is linked but not readily identifiable,
    // so we need to find it
    const versionLink = [...footer.querySelectorAll('address a')].find((a) => {
                         return /^(\d+\.\d+\.\d+)$/gi.test(ZU.trim(a.innerText));
                     });
    const version = ZU.trim(versionLink.innerText);
    // Convert date format
    const date = ZU.strToISO(text(footer, "address > span.date").replace(/(\d)(th|nd|st)\s/gi, "$1 "));
    // The declared root lang is unreliable; we need to use the meta
    const language = attr(doc, "meta[name='DC.Language']", "content").split(/\s+/).pop();

    item.publisher = "TEI Consortium";
    item.date = date;
    item.edition = version;
    item.language = language;
    item.accessDate = new Date().toLocaleDateString();
    item.creators.push({
        lastName: "TEI Consortium",
        creatorType: "editor",
        fieldMode: 1,
    });
    item.attachments.push({
        title: "Snapshot",
        document: doc
    });
    if (url.includes('/release/')) {
        const vaultURL = url.replace('/release/', `/Vault/P5/${version}/`);
        item.extra = vaultURL;
        item.attachments.push({
            title: `TEI Guidelines version ${version} (Vault)`,
            url: vaultURL,
            mimeType: "text/html",
            snapshot: false
        });
    }
    return item;
}

There's also test cases at the bottom showing what the data structure looks like for an item, but appended here for reference too:

{
  "itemType": "bookSection",
  "title": "13 Names, Dates, People, and Places",
  "creators": [
    {
      "lastName": "TEI Consortium",
      "creatorType": "editor",
      "fieldMode": 1
    }
  ],
  "date": "2023-11-16",
  "bookTitle": "TEI P5: Guidelines for Electronic Text Encoding and Interchange",
  "edition": "4.7.0",
  "extra": "https://tei-c.org/Vault/P5/4.7.0/doc/tei-p5-doc/en/html/ND.html",
  "language": "en",
  "libraryCatalog": "TEI Guidelines",
  "publisher": "TEI Consortium",
  "attachments": [
    {
      "title": "Snapshot",
      "mimeType": "text/html"
    },
    {
      "title": "TEI Guidelines version 4.7.0 (Vault)",
      "mimeType": "text/html",
      "snapshot": false
    }
  ],
  "tags": [],
  "notes": [],
  "seeAlso": []
}