Devanagari Rendering - Githubissues

peanutbutterandcrackers commented 5 years ago

This is a continuation of the Google Forum Thread. I thought this would be a bit easier. I will post a link to this thread there as well.

Chordpro Version: Latest Git Branch: Pango Commit: 6317be3 Chordpro Config File: myconfig.json.txt Fonts required: Lohit Devanagari (fonts-lohit-deva)

I was writing this chordpro file and an issue occurred. This is not really an issue per say but just something that could be done better. One occurrence of the issue happens near the end of chorus:

[A7]सब थोक [F#m]दिन्[A]छु [D]म।
{end_of_chorus}

It renders somewhat like this: However, it would be better if it rendered like this: The 'न' in 'दिन्छु' is something that joins with the succeeding character, but is pronounced with the preceding letter's syllable, therefore prompting many to put the [A] after the half न (न्), as "The chords are placed in front of the syllable they belong to" in chordpro.

One could, indeed, put the [A] before the half-न, as a work-around, and it would be rendered like it is supposed to, as is the case with the first line; however, that is not placing the chords in front of the syllable they belong to.

Further info: the half-न is created by adding ् to न. Perhaps that might be a useful info.

Is this possible?

Thank you, once again, esteemed developer, for this amazing program!

peanutbutterandcrackers commented 5 years ago

I was thinking that perhaps the fact that the chord, F#m in this case, causes the rest of the letters to be pushed back has something to do with the half-न rendering without pairing with the छु in the case above (Image 1); and was wondering if it would have rendered right if the letters were not being pushed by the chords, when I encountered the following: The corresponding chordpro: I would have expected it to render something like: (The half न (न् - which is न + ्) is paired with छौ.)

sciurius commented 5 years ago

ChordPro splits the text into pieces (by the chords) and then the pieces are typeset independently. So if A, X and Y combine into glyph AZ (sorry, I cannot type Devanagari, but I assume you get the idea), then "AXY" will be typeset as "Z", but "AX[A]Y" will be typeset as "AXY". As you already found out, a workaround is to keep the typographically combining characters together, e.g. "AXY[A]" or "A[A]XY", even if that "looks wrong" to write.

The Unicode name for न् is DEVANAGARI SIGN VIRAMA and its category is 'Mn' (Mark, Nonspacing). Does your issue only occur with these virama characters?

peanutbutterandcrackers commented 5 years ago

@sciurius - Yes, sir. I was thinking of a name more like 'joining character' or something. The issue seems to occur with characters that are combine with the succeeding characters. At least, that is what I have noticed so far. I was wondering if there could be an if block for such unicode characters that combine with others (if there is a particular name/type for them) it would solve this issue? if character_combines_with_succeding_char(char): do stuff; fi But I am aware that you might want to go for a more elegant solution rather than a hacky hack, if there is one.

Please do let me know if you do write anything. I would be very glad to test things out (with Latin and Devanagari script).

sciurius commented 5 years ago

I've checked in a special treatment for Virama characters. Basically, when you write

द ि न ् [A] छ ु

it is treated as if you wrote

द ि [A] न ् छ ु

So the virama and its preceding character are moved to the start of the next phrase.

Can you test this to see if it solves the issues?

peanutbutterandcrackers commented 5 years ago

@sciurius - Yes, sir, it does appear to work. Thank you for your hard work. However, the chord is still placed right above the half न्. Could that be changed to move a little bit to the right so that it is above the consonant?

I wonder if there are other scripts that might have other class of characters that cause issues similar to this (including Devanagari, that I might not have yet run into). I hope the 'special cases' won't start piling up. (I am beginning to worry more about chordpro code base itself more than how well the renderings come in my native script. I do love this amazing software so very much. Thank you so much for all the hard work that you have put in. If I ever get around to learning perl, I would love to be able to contribute here. This is my favorite perl-project.)

peanutbutterandcrackers commented 5 years ago

Another thing that might help make better sense of this: In Nepali, at least, these half-letters are pronounced in the same syllable as either their preceding or, in some cases, succeeding letters. So, the chord would never change on the half-letters but either on the preceding or the succeeding one. Perhaps that might translate into code a lot better than what I said in my previous comment.

sciurius commented 5 years ago

Can you post the source for this song?

sciurius commented 5 years ago

Another thing I was wondering... When a chord is wider than a syllable, the syllable is pushed a bit to the right. This creates an empty space between the syllables. Is that conventional/permissible in Deranagavari? For example, the first line at the Bdim, space and "-" is inserted.

peanutbutterandcrackers commented 5 years ago

@sciurius - Here you go, sir. the '-' at the first line is a grammatical hyphen, it is in the source and not a thing to worry about.

However, now that you say it, I think what would be really really great if we could specify a delimiter character(s) with something like --word-split-marker " " for empty space, --word-split-marker "-" for a hyphen, --word-split-marker " - " for a hyphen surrounded by two spaces on either side, --word-split-marker "++", etc.

[We (the chordpro community) could not have asked for a better maintainer. Thank you for going through the trouble of thinking of word-splitting. Even I myself didn't know how cool this feature would be!]

sciurius commented 5 years ago

I checked in an alternative approach. This time I keep track whether I moved न ् to the next syllable, and shift the position of the chord if so. However, it is not possible to determine the exact width of the ligature part of न् since its actual width depends on the glyph following it. As it comes out the chord is slightly too far to the right. Please test it and let me know if this is a useful improvement.

(Note that in the final word on the second last line it is no longer necessary to add a space)

We have been discussing this using the case of न ् but do I see it correctly that this can happen to every consonant followed by a virama? E.g. ज् ञ् and so on?

Also, I would suggest to tweak a couple of settings in your config: In the pdf -> spacing, change lyrics to 1.5 and chords to 0.9. This lowers the chords a bit so they are closer to the lyrics.

peanutbutterandcrackers commented 5 years ago

Yes, sir! It is a very useful improvement! It looks neat. If I run into any issues, I will report it here again.

Yes, sir. ज्, ञ्, etc. also do the same thing. I did test it out with those letters and they render just fine, except for ङ्: (correct ones are highlighted green while the incorrect ones are highlighted red) Either, we could make a compromise and, for the sake of this one exceptional character, push the chords by a factor of 0.2 (w/ reference to the character width) or some such other number and make the whole thing not look 'too bad'. Or perhaps separate modules should be created to compartmentalize script specific rules with the exceptions in the scripts themselves (so as to not pollute the code base with too many special cases for Devanagari and other script system, and accommodating exceptions within the language itself)? I am sure as more and more people start using chordpro to transcribe songs in more and more languages, this might eventually get out of hand. I don't really know. I'm just a n00b.

Regarding the --word-split-marker "marker_string" option, it does seem like a really useful addition. In the following screenshot (the same .chordpro file's output), while the last verse is all good (highlighted in green), the others highlighted in red are words that have been split. On second thought, it would probably be great if the split markers could have their fonts and size specified so as to give them a distinct look, so that they can't be confused with the normal lyrics and their markers. But perhaps that could be done with --word-split-marker "{textsize: 50%}{textfont: "OpenSans Italic"} - {textfont}{textsize}" and that might do the trick?

sciurius commented 5 years ago

I checked in an alternative approach that as far as I can see produces correct results. Can you check it?

sciurius commented 5 years ago

Regarding spacing out syllables: [Bdim]तमाङ्[C]ग comes out as तमा ङ्ग . To obtain space for the (long) chord, the syllable ङ्ग is moved to the right. Since in Devanagari (as I understood) all components that make up a word are joined by the horizontal rule, shouldn't this be typeset as scrot20191016170041 ? I looked for a font glyph suitable to be used for this but could not find one.

peanutbutterandcrackers commented 5 years ago

Yes sir! It renders well, too, now! Pretty neat! Thank you!

Haha. That made me chuckle a bit. I have never seen Devanagari written like that. Only a non-native could come up with such an idea! :smile: (But I can see why it sounds like the only logical way to do it. Haha) No, sir. I don't think there would be one that would render like that. We mostly just use hyphens, etc to mean that 'this word has been split'. I think a splitter-character option would be of tremendous help (even to Latin-script users).

sciurius commented 5 years ago

Only a non-native could come up with such an idea!

That's called "thinking out of the box" . :smile:

Anyway, I've checked in support for split markers. You can set it in the pdf section of the config file:

"split-marker" : "…"

You can use full markup, e.g.

"split-marker" : "<span color='red' size='x-small'>…</span>"

.

peanutbutterandcrackers commented 5 years ago

Haha :smile:

Great! I just tested it and there seem to be a few things that appear to be in need of polishing up.

It seems that the split marker is being used only once, and then space is being used to space the letters out. I was hoping that the split-marker itself would be repeated, rather than a space. In the image above, I was expecting for chordpro to repeat * twice (or however many times it is required) instead of using a space.
This one is probably related to another issue (that I haven't yet reported, but it seems I might have to report it soon now): Long story short, the logic for when a word is being split probably needs to be made a little more stricter. For this chordpro file, in the first verse, chordpro is mistaking spaced words for split words. Chordpro:
```
वर्तमान [E]मेरो[A] तपाईंको [E]हातमा,[C#m] भविष्य [G#m]पनि[A] तपाईंको [D#dim]साथमा
```
Please note that [G#m]पनि[A] is a single word surrounded by two chord changes immediately before and after it (this song features a lot of such chord changes, and chordpro has issues rendering that in and of itself - another issue that I need to report). But it is currently being rendered as: The *-s highlighted in red should not be there. And the * in green should be (probably thrice, in all: see point no. 1). So, the logic should probably be something like: consider a word to have been split if and only if a chord specification is not preceded or followed by a white-space. That should fix this issue altogether. [G#m]पनि[A] is surrounded by white-space, the natural word-delimiter, on both sides and none of the chords split the word पनिto require the addition of the split-marker.

Another thing: If split-markers are used instead of spaces (as I suggest in 1), if the user has set the split marker to be something really nonsensical like "---~~~---" (a long string), I think chordpro should be able to handle that as well: i.e. the word would be pushed just that far off by the split-marker itself, and yet the chords should line up just like they do right now (with just the added distance). Oh, never mind, it already seems to be the case. Working really well!

sciurius commented 5 years ago

Okay. I've added split-marker-repeat (true/false) that will have the split-marker repeated.

sciurius commented 5 years ago

And I've removed it again. split-marker can now be a 3-part array: start, repeat, and final. final is always printed, last. start is printed if there is enough room. repeat is printed repeatedly to fill the rest.

Assuming split-marker is [ "x", "y", "z" ], then depending on the amount needed you will get

z
xz
xyz
xyyz
xyyyz

and so on. All parts can be left empty.

peanutbutterandcrackers commented 5 years ago

@sciurius - Even markup works for all three! Neato!

BTW, what markup is it, exactly? Just plain old HTML tags? Is there some documentation somewhere that I could read up regarding the markups?

peanutbutterandcrackers commented 5 years ago

The other issue (the one outlined in no. 2) seems to be fixed as well. Except maybe one more addition is required: besides white-space characters, punctuation marks should also be considered to NOT demarcate a word-split. As in here: (2nd verse, first line, highlighted in red)

sciurius commented 5 years ago

Nope. In the chordpro text you write:

[G#m]गर्छु[A],

So you request the chord to be placed above the comma. If you want the chord after the comma, use

[G#m]गर्छु,[A]

sciurius commented 5 years ago

BTW, what markup is it, exactly?

See https://developer.gnome.org/pygtk/stable/pango-markup-language.html

peanutbutterandcrackers commented 5 years ago

Hmm... I see. Thank you.

Regarding the [A]Word[B], vs [A]Word,[B]: I wonder if, for many of the users, [A]Word[B], comes more intuitively than the latter? At least it did for me... But sure this is not that big of a deal. Thank you for all the help. Every seems to work just right now. I will be reporting further issues as I find them.

Thank you very much!

P. S: Will this pango version be merged with master anytime soon?

peanutbutterandcrackers commented 5 years ago

@sciurius - I was thinking about this, and I just have a few thoughts.

Perhaps it is better if [A]word[B], is treated as [A]word,[B] internally (because it seems to make a bit more sense from the user's perspective)?

But other than that, since this is the reference implementation, perhaps there should be a document somewhere about the linguistics of it, too? Like this very thing of not treating word[chord]punctuation as a split, of the way combining characters are handled etc so that other implementations of this can follow the linguistic rules of the reference implementation too? Just a thought.

Here is something that I imagine the document would have:

punctuation and white-space are not considered part of a word (and hence do not count when words are split)
chords immediately following the last letter of a word like this[c] is rendered before the next word but after this.
Devanagari virama characters are handled in such and such way.

etc? So that there will be consistent behaviours among chordpro implementations?

sciurius commented 5 years ago

I think it is better to stay far from interpreting lyrics and leave it to the author to decide where to put the chords. What is logical to you may be counterintuitive for someone else. The current way of determining where to put the chords seems to work fine for the millions of songs already written using a flavour of Chordpro.

I made an exception for the Devanagari virama characters because it would otherwise not be possbile to obtain the correct placement of the chord above the ligatures.

As for your other question about merging pango into the master: All three braches (master, pango and markup) are fully compatible but I need more user feedback on the pango implementation, in particular for Windows and Mac.

peanutbutterandcrackers commented 5 years ago

@sciurius - I see. That makes sense. Thank you.

Please do let me know if there is anything I can do to help.

(Perhaps I should start learning texinfo or something seriously so as to help out in documentations for the projects that I like.)

peanutbutterandcrackers commented 5 years ago

@sciurius - Hello again,

I was trying to package chordpro for to use with the Guix Package Manager for personal use (to share the pango branch with friends, among other things, and just to learn guix in general) and it turns out that I need to package App::Packager first. And that does not seem to have a license. And it seems that you maintain that as well. So could you please add a license to it?

The reason why I ask for a license is so that, perhaps, chordpro itself could be added to guix gnu distribution (they only accept libre software). https://guix.gnu.org/packages/ This would give even more users exposure to chordpro.

sciurius commented 5 years ago

Nice! In the App::Packager source and doc it reads: COPYRIGHT & LICENSE

Copyright 2017,2018 Johan Vromans, all rights reserved.

This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

Is that sufficient?

peanutbutterandcrackers commented 5 years ago

It is, technically. But there's this command in guix that generates package definitions for stuffs in cpan, pypi, etc. and it is currently showing #f (false) for license because it isn't there. Hence the request. If it isn't too much trouble, having the license explicit like your other repo would make it even better, I think.

Also, please do not get your hopes too high. It might be another entire month before I can have my channel (equivalent of PPAs, only better) set up containing chordpro with package definitions that aren't horribly ugly. [I am still just learning; and pretty much a n00b.] But, should I manage to get it done well-enough, I will try to get chordpro into gnu distribution (software repository) too.

peanutbutterandcrackers commented 5 years ago

So, the guix package definition mostly works now. However, it fails one test. I thought perhaps you'd be interested:

starting phase `check'
make -f Makefile test
make[1]: Entering directory '/tmp/guix-build-my-chordpro-idk.drv-0/source'
PERL_DL_NONLAZY=1 "/gnu/store/dna8kpb00kq176rz8x69yy4j33my2q55-perl-5.28.0/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
# Using Cairo/Pango for PDF generation
t/01_prereq.t ............. ok
# Testing App::Music::ChordPro 0.973_028, Perl 5.028000, /gnu/store/dna8kpb00kq176rz8x69yy4j33my2q55-perl-5.28.0/bin/perl
t/02_load.t ............... ok
t/100_basic.t ............. ok
t/101_empty.t ............. ok
t/102_new_song.t .......... ok
t/103_title.t ............. ok
t/104_subtitles.t ......... ok
t/105_chords.t ............ ok
t/107_chords_latin.t ...... ok
t/108_chords_solfege.t .... ok
t/109_chords_nashville.t .. ok
t/110_chords_roman.t ...... ok
t/112_comment.t ........... ok
t/113_comment.t ........... ok
t/114_songline.t .......... ok
t/115_songline.t .......... ok
t/116_chorus.t ............ ok
t/117_rechorus.t .......... ok
t/118_tab.t ............... ok
t/119_verse.t ............. ok
t/120_meta.t .............. ok
t/122_memorize.t .......... ok
t/130_image.t ............. ok
t/131_image.t ............. ok
t/140_chords.t ............ ok

#   Failed test 'Song contents'
#   at t/141_chords.t line 100.
#     Structures begin differing at:
#          $got->{body}[3]{chords}[0]{strings} = Does not exist
#     $expected->{body}[3]{chords}[0]{strings} = ARRAY(0xbf22e8)
# Looks like you failed 1 test of 6.
t/141_chords.t ............ 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/6 subtests 

#   Failed test 'Song contents'
#   at t/142_chords.t line 102.
#     Structures begin differing at:
#          $got->{body}[3]{chords}[0]{strings} = Does not exist
#     $expected->{body}[3]{chords}[0]{strings} = ARRAY(0x134f8b0)
# Looks like you failed 1 test of 3.
t/142_chords.t ............ 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/3 subtests 
t/150_fonts.t ............. ok
t/151_fonts.t ............. ok
t/15_subst.t .............. ok
t/160_diagrams.t .......... ok
t/161_titles.t ............ ok
t/162_newpage.t ........... ok
t/163_columns.t ........... ok
t/164_pagesize.t .......... ok
t/169_custom.t ............ ok
t/170_transpose.t ......... ok
t/171_transpose.t ......... ok
t/172_transpose.t ......... ok
t/173_transpose.t ......... ok
t/174_transpose.t ......... ok
t/175_transpose.t ......... ok
t/177_transcode.t ......... ok
t/180_grids.t ............. ok
t/20_basic01_crd.t ........ ok
t/21_basic02_crd.t ........ ok
t/30_basic01_cho.t ........ ok
t/31_basic02_cho.t ........ ok
t/50_encodings.t .......... ok

Test Summary Report
-------------------
t/141_chords.t          (Wstat: 256 Tests: 6 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
t/142_chords.t          (Wstat: 256 Tests: 3 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1
Files=49, Tests=2228, 39 wallclock secs ( 1.33 usr  0.22 sys + 29.87 cusr  4.94 csys = 36.36 CPU)
Result: FAIL
Failed 2/49 test programs. 2/2228 subtests failed.
make[1]: *** [Makefile:1010: test_dynamic] Error 255
make[1]: Leaving directory '/tmp/guix-build-my-chordpro-idk.drv-0/source'
make: *** [GNUmakefile:12: test] Error 2

sciurius commented 5 years ago

Please try App::Packager 1.430.1 from CPAN to see if it fullfils the license query. I checked in a fix for the failing tests 141 and 142.

peanutbutterandcrackers commented 5 years ago

Thank you very much. I am currently working on getting the first draft ready. I need to figure out quite a bit of things first. Once that is done I will use the newer versions. For now I have disabled tests altogether, but will re-enable them soon (and report you of any failures, if any). Thank you.

sciurius commented 5 years ago

Please check out the new 'dev' branch from git. It should give you all Pango powers. The 'pango' branch will go away eventually.

peanutbutterandcrackers commented 5 years ago

Thank you! I will try to package it up in guix, again. My first version does indeed work but it isn't following the best practices. Still a lot to learn. I will also be testing to see if there are any further issues. :)

peanutbutterandcrackers commented 5 years ago

@sciurius - I tried packaging it and got the following test failure (entire log):

38bb3wvcpm53m0mbj4xcigq0xipx7h-chordpro-0.975-beta.drv.zip

sciurius commented 5 years ago

ChordPro requires PDF::API2 (which requires Font::TTF), Text::Layout and (in your case) Pango (which requires Cairo). You must arrange for these packages to be present on the target system.

peanutbutterandcrackers commented 5 years ago

@sciurius - I am sorry. I didn't quite take care of it all (still a n00b). I did package it again, and it works now. And all the tests passed too. I haven't yet used it. But I will and will let you know if anything seems amiss. Thank you.

ChordPro / chordpro

Devanagari Rendering #75