Open TinaRussell opened 3 years ago
@TinaRussell Hi, I know you've been in touch with James Tauber on related issues but I didn't want to leave this unanswered. I don't know of any converters or other tools for this—we don't host any at Perseus.
The original abbreviations should still be in the data but we don't have a mapping tool for these. The abbreviations in LSJ are fraught with irregularities, though, so this can be a challenge. An early project of mine was cleaning up these links and correcting invalid references, so often times the data itself was either incorrectly entered or inconsistently presented.
I am not aware of a single master list of all of these URNs — particularly the base URNs (such as urn:cts:greekLit:tlg0033.tlg001) but the underlying data is cataloged such as here:
There may be tools or scripts others have created to better address this and James would be the best place to start with that.
FYI, Giuseppe Celano has a Unicode version of the data: https://github.com/gcelano/LSJ_GreekUnicode
Yeah, for my project I tried to make something that would expand the abbreviations, and I was able to come up with a one-to-one mapping for the author abbreviations, but for abbreviations of works, some are unique, some vary in meaning depending on the author given, and some I think you’re just supposed to figure out from context. It’s a headache. But, since every reference/citation in the LSJ has a URN attached, I realized I ought to take advantage of that, as it means somebody before me had to figure out what each citation means (man, what a Herculean task).
Thank you for pointing out the Perseus Catalog! I suppose the makeshift solution would be to plug each URN into the catalog’s URL scheme, scrape information from the resulting page, and cache the information somewhere. But, there’s gotta be something more elegant/aboveboard than that.
I’ve asked James about how to use the URNs, but haven’t heard back from him on it, yet.
My project is here, by the way: https://github.com/TinaRussell/hermeneus
I may have found the answer: http://sites.tufts.edu/perseuscatalog/?page_id=93 “…to specifically request the ATOM feed of the data, you append /atom to the URIs.” So by using the canonical URL plus /atom
, I should be able to get something more machine-readable.
You could probably also use the ScaifeDL CTS API's getCapabilities request:
https://scaife-cts.perseus.org/api/cts?request=GetCapabilities
That gives you the author/work/edition/translation metadata for every URN.
Thank you for that! BTW, I’ve tried making other requests using that URL format, following the specification here https://github.com/cite-architecture/cts_spec/blob/master/md/specification.md and it doesn’t seem to work. For example, http://scaife-cts.perseus.org/api/cts?request=GetLabel&urn=urn:cts:greekLit:tlg0020.tlg001.perseus-grc1:195 gets me an “UnknownCollection” error. Is there something I’m doing wrong, or is the functionality simply unfinished? Thanks!
I would guess that it's probably just unfinished, but @jtauber would be a better person to answer that.
@TinaRussell
I think you want to use something like http://scaife-cts.perseus.org/api/cts?request=GetLabel&urn=urn:cts:greekLit:tlg0020.tlg001.perseus-grc2 without the passage for that particular call.
A few points to add.
So, I managed to pull together a list of all the unique URNs cited in the LSJ. If you’re curious, it’s here: https://pastebin.com/aBDUBU07 They’re shortened to the work part of the work component (e.g. “urn:cts:greekLit:tlg0020.tlg001”), given what you said @lcerrato and because I figured Liddell and Scott weren’t terribly concerned with differing digital editions. Then I tried using the API to get the title for each one, and I found that about half of the URNs in that form work, and about half return an error. E.g. the first one, to the Odyssey, works: http://scaife-cts.perseus.org/api/cts?request=GetLabel&urn=urn:cts:greekLit:tlg0012.tlg002 but, the second one returns an error: http://scaife-cts.perseus.org/api/cts?request=GetLabel&urn=urn:cts:greekLit:tlg4083.tlg001 Again, is this unfinished functionality? Are URNs shortened like that supposed to work? Or, is this a better question for @jtauber?
@TinaRussell tlg4083 is not in the Scaife Viewer, so I wouldn't expect it to work. It's also not identified in the catalog, although I see an issue that indirectly refers to this. I see it is the Eustathius Commentary on the Iliad. I also see this on an old survey of IDs for which no results were returned — which would make sense.
hi @TinaRussell , Peter Heslin has incorporated the URNs in his Diogenes application, whose code you can download at https://github.com/pjheslin/diogenes . To accommodate this use in Diogenes, I've done fairly extensive work on the references in LSJ and Lewis & Short (hunting down and repairing where Il. 2.349, 458 becomes Homer-Iliad-2-349, Homer-Iliad-458, or the like). Maybe his code will be helpful? He allows people to type in authors and select works by title, and nobody is confronted with URNs directly, but perhaps you can make use of his code to go in the other direction.
@helmadik Thanks! I ended up writing a script to take the base URN of every work cited in the LSJ, try to see if it gets a result via the CTS API, and if so, record the URN and the work’s title in text form, as key-value pairs in a hash table, as seen here: https://github.com/TinaRussell/hermeneus/blob/fca545966fc358c7d3e574bc7c7443e8fc28fa05/hrm-abbr.el#L389 The program uses the resulting hash table (instead of calling the API directly) to figure out which work has what title. It only works for about half the works cited, though (for the others, the abbreviated title shown in the LSJ stays as it is), so it’s quite possible that Peter has figured out a better way.
I’m curious if there is a standardized way to resolve the URNs found in the lexicon, e.g. “urn:cts:greekLit:tlg0033.tlg001.perseus-grc1:6:35”, to something human-readable (showing author, work, etc., in a less abbreviated form than it appears in the LSJ), short of writing something to parse the data over at https://github.com/PerseusDL/canonical-greekLit myself.