Closed wujastyk closed 2 years ago
There shouldn't be a problem with inserting Unicode characters from the Newa block. What about copy-and-pasting the character from here: https://en.wikipedia.org/wiki/Newa_(Unicode_block) ?
That was an interesting blog post (as well as the one by Birgit Kellner). I think maybe you should use a different mapping for the filler character that you describe. I looked recently at a fragment of the Aṣṭasāhasrikā which seems to contain older versions of both the filler character you describe, and the (possibly more recent) filler character described in the Newa Unicode block:
https://tst-project.github.io/mss/Sanscrit_1438.xml
See folio 1v. I've provisionally transcribed the "older" filler character with ?
and the "newer" filler character with -
.
Ok, so I came up with this solution:
tl;dr — you can use <g ref="#newa-gap-filler"/>
and <g ref="#newa-old-gap-filler"/>
to get the two different kinds of gap filler characters.
sample:
more details:
definitions.xsl
, which contains a list of names of entities and other things (similar to what we're doing at TST).newa-gap-filler
, and the other called newa-old-gap-filler
.A thing of beauty:
Can we have newa-placeholder-mark
too?
And "siddham" or "siddhi": Bhattarai pp. 102-104
And puṣpikā (Bhaṭṭarai passim), and space fillers (Bhattarai, pp. 222, 225, 228, et passim)
Hmm, so now I'm thinking maybe we should refine our approach so that we can talk about similar signs across scripts. For example, maybe we should do something like <g type="line-filler" rendition="#newa-gap-filler"/>
in case we, hypothetically, want to collate these signs across different manuscript cultures in the future. This is especially because the palaeographic situation we're working with, with all these varieties of Nepālākṣarā, (proto-)Bengali, etc. don't map neatly on to the Unicode blocks of "Newa" and "Bengali". For example, we have a variety of "siddham" signs in manuscripts, but in Unicode we have U+1144A NEWA SIDDHI and U+0980 BENGALI ANJI. But in our manuscripts, there really isn't this division between Newa and Bengali.
Maybe we can do something like:
<g type="siddham" rendition="#newa-siddhi"/>
<g type="siddham" rendition="#beng-anji"/>
Or we could keep @ref, like so:
<g type="siddham" ref="#newa-siddhi"/>
<g type="siddham" ref="#beng-anji"/>
Although, again, in these 11-15th c. North Indic manuscripts we're working with, it doesn't really make sense to differentiate between a "Newa Siddhi" and a "Bengali Anji". Anyway, what do you think?
Yes, I think you're right. That will generalize the approach. I also wonder about rendering in different scripts, the old "unicode isn't glyphs" issue. While I love the Newa fillers in the middle of IAST text, it probably isn't quite the right approach. Each writing system should have it's own glyph for the same underlying Unicode code-point. And you already define many of these for Roman transliteration in your Saktumiva Transcription conventions https://saktumiva.org/wiki/transcription; I'd like to go on using those and maybe expand the list just slightly. I like your @type @rendition idea. When there's an IAST convention, we could say
a
Or am I muddling things. I think I'm muddling things. Why say "newa" in a line of IAST?
I think it's actually the situation itself which is quite muddy. As you wrote in your blog post, it's important to think about the function of a character, and it seems like, for the moment, what we're calling newa-gap-filler
and newa-old-gap-filler
have the same function, but it's hard to be sure unless we do more serious palaeography. So it makes sense to preserve both the function of the character (as we understand it currently) as well as its appearance (because they are so different).
Incidentally, I think the "broken daṇḍa" line filler ( ¦ ) actually has a different function from the "newa-gap-filler". The "broken daṇḍa" (and other similar-looking signs) is generally used at the end of a line or before a stringhole space. But the "newa-gap-filler" is never used this way, although as Birgit mentioned, there's no consensus about what it means. So I think we should have two different conventions: one for an "(end-of-)line filler", and another for a "gap filler", equivalent to, for example, Devanāgarī U+A8F9 ( ꣹ ).
Here are some options on how we might represent this:
~
for the "normal" gap-filler as defined in the Newa Unicode block, and <g ref="#newa-old-gap-filler">~</g>
for the "old-style" one (or whatever transliteration we decide upon)
<g type="gap-filler" rendition="#newa-gap-filler"/>
(which can appear as a Newa sign in IAST and Newa transliteration, but change to ꣹ in Devanāgarī)
<space type="gap-filler" rendition="#newa-gap-filler"/>
<space type="gap-filler" rendition="#newa-gap-filler" quantity="5"/>
(which would give you 5 filler characters)
Just as a quick follow-up regarding the "broken daṇḍa" line filler — I searched through all of my Dravyasamuddeśa transcriptions, and the "broken daṇḍa" has been used exclusively at the end of a line. It's quite nice to be able to do these searches!
Your suggestion,
~ for the "normal" gap-filler as defined in the Newa Unicode block, and
~ for the "old-style" one (or whatever transliteration we decide upon)
Seems as good as any to me. Could you fix it so that the ~ doesn't display in Saktumiva?
Do we want the newa-old-gap-filler to appear in the collation apparatus?
Just waiting for @ppasedach to chime in! He's working on similar issues: https://github.com/ppasedach/ratnakara-tei/issues/19
Sorry for the late response to this.
One very important manuscript for the Haravijaya is Jaisalmer 408, Jaina Devanagari, 12th century, serving also as an example in Bidur Bhattarai's book (HVM).
For the longer spaces at the beginning and ends and beginnings of lines this manuscript distinguishes between left and right side: https://github.com/ppasedach/ratnakara-tei/issues/19 . They are the first and second ones there. So far we transcribe both with ꣹. They would probably be good candidates for <space>
, as they actually fill up a bigger space.
There are two fillers for very short gaps, as before a string hole or at the end of a line written up to its end. I do not understand the difference between the two. Einecke only has the daṇḍa with the stroke on the bottom right side, another one is like a narrow U. I've seen corrections from one to the other. The daṇḍa with the stroke we transcribe as ¦, for the other one we still use a temporary placeholder. Both are probably good candidates for <g>
, as they don't take much space.
There's further signs, used for marking areas deemed not suitable for writing etc.
The avagraha-like sign is marking an area not suitable for writing. The text is complete here. Probably one wants <space>
here. I am not sure if one wants anything to graphically represent the space-fillers here, or just specify the dimension of the space left free.
Another manuscript, 19th century Devanāgarī, from RORI, Jodhpur, has many word-boundary markers.
They could be graphically represented in Devanāgarī as a daṇḍa and daṇḍa-avagraha combination (where a word starts with the inherent a attached to the last consonant cluster of the preceding word) under the line, maybe even simply subscript. But I'm not sure how to best express it in TEI, with <g>
, <metamark>
, or some other way.
So what was the siddhānta on this topic of space-fillers above? <g>
or <space>
? Or are both options going to work in Saktumiva? If it's a choice, I tend to <g>
.
So <g>
with @ref
is already implemented at the moment... I guess we can stick with that until something comes up!
For NEWA SIDDHI and BENGALI ANJI, maybe we can just use the corresponding Unicode characters for now, since they don't seem to have equivalents in other scripts.
Sounds good to me. Thanks!
NB @chakrabortydeepro
sorry, I somehow cannot figure out the final solutions to some questions discussed here:
what do we do about the broken daṇḍa-s? just type ¦ ? or do we do smth like
what about the word-boundry markers that @ppasedach mentioned? Something quite similar occurs in our NAK 5-333 all the time: or .
Also, there is a similar sign (looks more or less like a comma to me) that is super common in NAK 5-333. Have no statistics at hand, but I guess it's used most of the time to mark off pāda-s
from the point of how the signs look, 2 (word-boundary) and 3 (pāda etc. markers) look more or less identical to me
sorry... forgot the syntax in my question about broken daṇḍas. the question was if we type ¦ or smth like <g type="gap-filler" rendition="#newa-gap-filler"/>
after all?
Dear @ankleb,
For the filler that we've been discussing, please use <g ref="#newa-gap-filler"/>
and
<g ref="#newa-old-gap-filler"/>
as described by Charles in the July 21 comment above.
Question 1 in the comment above:
The use of the broken daṇḍa ¦
is documented at Saktumiva. It's an end-of-line filler. A bit like the use of a hyphen in Latin-script text. Bhaṭṭarai (p.9) calls it "line-filler sign or hyphen sign used before the string-holes or at the end of the line on the folio"
Question 2 in the comment above:
If I understand your question, use <gap/>
. You can add information thus:
<gap extent="1" unit="char" reason="insertion in the line above"/>
Thus, for the first folio you show,
sapta<gap extent="5" unit = "char"/>parṇṇāni
pad<<gap extent="5" unit = "char"/>dma
But in these specific cases, we wouldn't actually use <gap/>
. In the Suśruta project, we are not recording the size of the gaps around string holes because we can't think of a reason they would be interesting. We're just coding the string holes as "column breaks" using <cb/>
because it's quite useful to know that for finding your way around a page (along with line begin <lb/>
and page begin <pb/>
). We say <cb n="1"/>
, to mean the first string hole, etc.
Question 2 in the comment above:
I'm not sure. It's a question of thinking about its meaning and then looking in the TEI Guidelines for the most appropriate tag. I suggest, at present, just "punctuation" <pc>
with some explanation. So you could say,
<pc function="pāda divider">,<pc>
Later, if we learn more, we can update or change the tag as needed, as long as you are reasonably consistent so that search-and-replace works
For end-of-line daṇḍa, yeah, I think we should use the broken bar character for now, which is also what we used in the Cambridge project. I'm not sure if we want to distinguish between a broken daṇḍa and a daṇḍa with a slash through it.
For pāda dividers — yeah, I talked with Peter about it, I couldn't decide between something like:
<note place="above">|</note>
or
<add place="above">|</add>
I guess it depends how you want it to be treated in the end. Like, do you want those daṇḍas to be grouped with the other things you've tagged as <note>
, or the other things you've tagged as <add>
? For example, when you make a collation, you can decide whether to keep all the <note>
tags or ignore them all; ditto with the <add>
tags.
Thanks a lot for clearing up the broken danda issue.
As for the divider marks. I guess the ones that Peter showed in his post and that I have under nr. 2 are indeed some kind of secondary additions (in Peter's case more secondary than in ours I think, but still), so either <note ..>
or <add...>
should be fine.
However, I think the ones that I have in my question 3 are somewhere between gap fillers and punctuation signs, perhaps, closer to punctuation. I've been transcribing them as , (comma), but I was wondering if there was anything more systematic/ unified?
For example, Graheli in his Nyāyamañjarī-edition uses some kind of half-daṇḍa sign (or whatever it is) to render any kind of punctuation marks other than the daṇḍas. But in his case, I don't think he tries to represent a specific sign in any of his MSS.
Silk (on p. ix) gives a whole set of different punctuation signs that he finds in his MSS and explains how he represents them in his edition:
kṣamāṃ yāce!!!
so... are we transcribing these guys as ¦
or <space/>
or smth else?
I think that broken daṇḍas and daṇḍas with a slash both serve the same function, right? They generally (almost always) appear at the end of a line? We could either transliterate all the "end-of-line" daṇḍas as the broken bar or do something like:
<g rend="daṇda with slash">¦</g>
In this case we're kind of indicating that they serve the same function, but there are variations in how they're stylized, I guess.
Thanks @chchch! The solution you propose seems to give rather exhaustive info about the sign. I think I'll be voting for that when our group has the next round of elections.
Since people seem to be incredibly passionate about daṇḍas, I've modified the Devanāgarī font to include variants for the broken daṇḍa and the daṇḍa with a slash next to it. See here:
https://chchch.github.io/PedanticIndic/
I've also updated the page on transcription conventions with all the <g>
s that we have so far:
https://saktumiva.org/wiki/transcription
Maybe one day we should come up with some "canonical" names for these signs, and then we can just search-and-replace them all.
@chchch Hello Charles,
Could you please add two more Sharada characters?
sharada sign siddham [U+111DB] "𑇛" sharada section mark-1 [U+111DE] "𑇞"
I am rendering them as follows:
<g ref="#sharada-sign-siddham"/>
<g ref="#sharada-section-mark-1"/>
I added them to the Special Characters list in the Transcription conventions of Saktumiva.
Hmm, we currently have sarada-ekam
and sarada-siddhi
, which I guess was to be consistent with newa-siddhi
... should we just change everything to be consistent with Unicode block names?
I'd recommend going with the Unicode block names. I can see this process of adding required chars continuing, so it would be best not to make up lots of private names.
Ok, I added:
and in the file definitions.xsl, I marked as deprecated:
Thank you very much!
On Wed, Apr 12, 2023 at 2:33 PM chchch @.***> wrote:
Ok, I added:
- sharada-ekam
- sharada-sign-siddham
- sharada-section-mark-1
and in the file definitions.xsl, I marked as deprecated:
- sarada-ekam
- sarada-siddhi
— Reply to this email directly, view it on GitHub https://github.com/chchch/upama/issues/10#issuecomment-1505892965, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIIRWBI767GRTCYO7F2LD5DXA4GTJANCNFSM5AWIMAXA . You are receiving this because you were mentioned.Message ID: @.***>
-- Deepro Chakraborty (he/him) PhD candidate Department of History, Classics, and Religion University of Alberta
The University of Alberta acknowledges that we are located on ᐊᒥᐢᑿᒌᐚᐢᑲᐦᐃᑲᐣ (Amiskwacîwâskahikan) Treaty 6 territory, and respects the history, languages, and cultures of the First Nations, Métis, Inuit, and all First Peoples of Canada, whose presence continues to enrich our institution.
@chchch Hello Charles, could you also add
sharada-section-mark-2
and sharada-continuation-sign
?
I added them in Transcription Conventions
done!
Thank you very much!
On Thu, Jun 1, 2023 at 2:44 PM chchch @.***> wrote:
done!
— Reply to this email directly, view it on GitHub https://github.com/chchch/upama/issues/10#issuecomment-1572752540, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIIRWBNHBXWZ764SYQB6D7TXJD5LHANCNFSM5AWIMAXA . You are receiving this because you were mentioned.Message ID: @.***>
-- Deepro Chakraborty (he/him) PhD candidate Department of History, Classics, and Religion University of Alberta
The University of Alberta acknowledges that we are located on ᐊᒥᐢᑿᒌᐚᐢᑲᐦᐃᑲᐣ (Amiskwacîwâskahikan) Treaty 6 territory, and respects the history, languages, and cultures of the First Nations, Métis, Inuit, and all First Peoples of Canada, whose presence continues to enrich our institution.
Hi, Charles. I'm working out how to represent some non-standard Newa glyphs. Specifically, "Siddhi" and "Newa gap filler", U+1144A and U+1144E. See the section "Unicode representation" at the end of this blog post.
In my TEI header, I have,
So, in the text of the MS transcription, if I insert the character simply like this:
𑑎
then it shows up correctly in the display at Saktumiva.The problem is that when I collate that MS, the whole line on which the character appears gets ignored or treated as a missing line (with an "omit"). I can't find a way round this, either by using
<g>
or the character itself (not the&;
form).Is it because I'm inserting a Newa character into an IAST file? That seems the wrong thing to do anyway, but how do I manage this?
While I'm failing to get
glyph
working, I'm just doing this, which at least gives some sensible output