Use CETEIcean to extract and transform only a subelement of a TEI document

umj95 commented 3 years ago

I am currently learning about CETEIcean and I am really impressed how easy it makes my life generally, even though this is my first real contact with JavaScript – so thank you so much for the great tool and my apologies in advance if this is a stupid question, but here goes: I was wondering whether I could use CETEIcean to get a subpart, say, a paragraph from a TEI document. My usecase is that I have a text in two languages in two separate TEI documents. One of them is being rendered (easily!) by CETEIcean but I would like to be able to load excerpts from the translation and display them side by side. My instinct was to create a new CETEI object, apply all my custom behaviors with .addBehaviors and then put the result of .getHTML5(mypath) into a variable which I could then descend into and get the element by ID. I haven't succeeded yet, mainly because I am having a hard time getting to grips with JS promises, but I didn't mean this to be a tech support question. Rather, I wanted to ask more generally, what CETEIcean's ideas about fetching subelements of an XML document are. Is there a smart way to go about such a thing, or is it too much philosophically opposed to transforming a full and valid XML document? Again, sorry if this is a stupid or inappropriate question, I just really didn't know where else to ask.

hcayless commented 3 years ago

It's a fine question. I don't think you'd need to create a new CETEI object to do this. You should be able to call getHTML5 repeatedly on different files. To get to your question, CETEIcean itself doesn't have the ability to partially retrieve external files. It's got to get the whole thing, and the Promises come in because fetching remote files is one of those things you'd typically prefer not to wait on while other stuff is happening, so you tell it what to do when the file is finished loading and the browser can get on with other tasks in the meantime. If you want to just work on parts of files, some options come to mind:

Do something on the remote end that lets you ask for, say, just the paragraphs you want. An XML database would let you do this, and there are other, maybe rather technical, ways to do it.
Fetch your files, but only give CETEIcean parts of them to process. You can sidestep getHTML5 by calling makeHTML5, which takes a DOM object as input and gives you a DOM object as a result, which you can then append to your page. So you could fetch the files you want (https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API) and parse them into an XML DOM (https://developer.mozilla.org/en-US/docs/Web/API/DOMParser). You can also use https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest, which is old school. Anyway, you get an XML document, get the chunk(s) you want by querying it, put them through makeHTML5 and append the results to your page where you want them.
Do what you're trying to do, which is fetch and convert the whole file but only append bits of it to your page. This should work. What you probably want to do is something like:

CETEIcean.getHTML5('https://example.com/myXML.xml', function(data) {
  const foo = document.getElementById("IdOfElementToAppendParasTo");
  for (const p of Array.from(data.getElementsByTagName("tei-p"))) { //Not certain you need the Array.from
    document.adoptNode(p); // I think you have to do this to append the paragraph
    foo.appendChild(p);
  }
}

The second parameter to getHTML5 is a callback that gets executed after all the fetching and processing, so you don't have issues with trying to work with stuff that hasn't loaded yet.

Hope this helps!

umj95 commented 3 years ago

Thank you so much for your reply! Great that I can just call makeHTML5 on a subpart of the text. I initally wanted to write back once I got it running, but since that will still take some time, I'll wait no longer. I am currently trying out your second proposed solution and I'm sure I'll get there eventually. At the moment I am still struggling with getting my element out of the document based on xml:id, but that has nothing to do with CETEIcean, so I'll close the issue. Thank you again for the quick help!

umj95 commented 3 years ago

I should add: It worked immediately after I wrote the last comment, when I again tried out your third solution. Heres the code I ended up using:

var c = new CETEI;

function fetchParagraph(paraID, noteBodyID) {
    let path = pathToData + currentChapter + secondaryLanguage + ".xml";
    c.getHTML5(path, function(data) {
      const noteBody = document.getElementById(noteBodyID);
      for (const p of Array.from(data.getElementsByTagName("tei-p"))) {
        if(p.getAttribute("id") === paraID) {
          document.adoptNode(p);
          noteBody.appendChild(p);
        }
      }
    });
  }

Thanks again!

TEIC / CETEIcean

Use CETEIcean to extract and transform only a subelement of a TEI document #46