erc-dharma / project-documentation

DHARMA Project Documentation
Creative Commons Attribution 4.0 International
3 stars 3 forks source link

encoding URLs #255

Closed arlogriffiths closed 5 months ago

arlogriffiths commented 9 months ago

Dear Amandine, Dan and Michaël,

I see at https://dharman.in/display/DHARMA_INSBengalCharters00050 that the following

<note>http://museumsofindia.gov.in/repository/record/im_kol-A20050-9085-18. Accessed in May 2018.</note>

is not nicely displayed:

Capture d’écran 2024-01-09 à 09 15 49

Indeed, the URL has not been furnished with any XML markup.

@danbalogh: I don't find in EGD any instruction for how to encode external URLs. How should it it be done? Can we add such an instruction? @michaelnmmeyer : can you furnish a list of all URLs that need to be properly encoded?

danbalogh commented 9 months ago

It would certainly be useful for web URLs to be clickable, and they definitely should appear without the auto-replacement of slashes to bars. Michaël may be able to auto-convert URLs in the XML contents to clickable links in the HTML display, but ideally, we'd better have a way to display a shorter and more user-friendly title instead of a full URL. I assume that something along the lines of <ref target="http://museumsofindia.gov.in/repository/record/im_kol-A20050-9085-18">Baigram fragment</ref> would work best, but I would like Michaël to suggest the best way to implement this. I will then add it to the draft guide.

michaelnmmeyer commented 9 months ago

The weird formatting is due to the use of // and / as shorthands for daṇḍas, and _ for space. These characters are ambiguous, I am not sure if I should leave them alone or try to better identify shorthands.

Dan's solution for URLs is the best one. I will write something to make URLs clickable, when they have the form https?://.+ (going beyond that is probably too error-prone).

danbalogh commented 9 months ago

I've added this to the draft of the next EGD release.

The shorthands should only be usable in "edition text", i.e. in the edition div (except for head and label children of that div), and in apparatus lem and rdg. There's unfortunately a potential ambiguity in text tagged as <foreign> in other parts of the file; I for one have certainly used slashes in <foreign> with the intent of having slashes (not daṇḍas) there. So I suggest the following:

michaelnmmeyer commented 9 months ago

@danbalogh OK, noted

arlogriffiths commented 5 months ago

I have now modified DHARMA_INSBengalCharters00050.xml like this:

I found this fragment by chance during perusal of the Museums of India website, which indicates that it is preserved at the Indian Museum, Kolkata, under accession number A20050/9085.<note><ref target="http://museumsofindia.gov.in/repository/record/im_kol-A20050-9085-18">http://museumsofindia.gov.in/repository/record/im_kol-A20050-9085-18</ref>. Accessed in May 2018.</note>

Even if, as I have done, I keep the URL itself in the <ref> wrapper, the display ay https://dharmalekha.info/texts/INSBengalCharters00050 is now impeccable.

So can we close this issue?

michaelnmmeyer commented 5 months ago

Yes, done.