DigitalKhatt / madinafont

Madina OpenType variable font
SIL Open Font License 1.1
15 stars 0 forks source link

ٱلْأٓنَ is written incorrectly but rendered properly #3

Closed munzirtaha closed 1 year ago

munzirtaha commented 1 year ago

Great project indeed! Thanks.

The text in src/qurantext/quran.cpp has ٱلْأٓنَ rendered properly but it's wrong. There is a MADDAH ABOVE and an ALEF WITH HAMZA ABOVE both together. It could be written as ٱلْـَٔانَ

mustafa0x commented 1 year ago

Indeed; this is how most text-based mushafs chose to write it though. I wrote it the latter way (see https://github.com/quranacademy/quran-text).

munzirtaha commented 1 year ago

Thanks for the quick response.

I checked khaledhosny/quran-data and it's correct there. I also checked tanzil.net mushaf and it's correct. The qurancomplex mushaf has lots of issues regarding encoding and it doesn't conform to the unicode standard so don't use it as a reference.

So, there are two issues here. The issue with the text and the issue with the font which is currently cannot render ٱلۡـَٔانَ correctly

mustafa0x commented 1 year ago

This is the same as tanzil: ٱلْـَٰٔنَ. See https://nuqayah.com/digitalkhatt.html

Edit: I see you're referring to src/qurantext/quran.cpp. Nevermind then. Not sure why that's different. In any case I use tanzil's text with this font.

aminanan commented 1 year ago

The font renders the word ٱلْأٓنَ (Surah Al-Jinn, Ayah 9) as follows

Capture and the word ٱلۡـَٔانَ as follows

Capture2

If I understand the problem correctly, the issue is that ٱلْأٓنَ must be rendered like the second image above and therefore the first word should be represented by الْآنَ and not ٱلْأٓنَ.

However, I checked with tanzil.net and it doesn't encode the two words as you suggested (you have to choose Uthmani text).

In both cases it uses TATWEEL with HAMZA ABOVE instead of MADDAH ABOVE. If I have to follow tanzil.net, the second form can't be rendered anymore with the current encoding ٱلۡـَٔانَ and will be rendered like the first image (That's probably how it should be).

munzirtaha commented 1 year ago

https://nuqayah.com/digitalkhatt.html still has the same issue and it's not like tanzil. Try to copy the word from that page and apply Courier New font e.g. and see the difference.

Just to make sure we are testing the same font. I am currently using https://digitalkhatt.org/assets/fonts/digitalkhatt.otf which I couldn't find it in github.

https://github.com/DigitalKhatt/madinafont/blob/main/digitalkhatt.otf is not working in LibreOffice and cannot be opened with fontforge. I expected this to be the OpenType version since the variable version is at visualmetafont/blob/master/files/digitalkhatt-cff2.otf

Edit: LibreOffice in LInux now supports CFF2 so I replaced the unmaintained version.

aminanan commented 1 year ago

https://digitalkhatt.org/assets/fonts/digitalkhatt.otf is the old OpenType CFF1 version and unfortunately it is no longer maintained. The latest version is digitalkhatt.otf from the current repository in OpenType CFF2 format.

It seems that this format is not supported by LibreOffice or Windows systems, but there are probably tools that can convert a CFF2 format to supported formats.

munzirtaha commented 1 year ago

If I understand the problem correctly, the issue is that ٱلْأٓنَ must be rendered like the second image above

No, ٱلْأٓنَ contains ALEF WITH HAMZA ABOVE + MADDAH ABOVE which is simply wrong and shouldn't be used in the Quran text. The font should render it as it is without hiding any of them.

and therefore the first word should be represented by الْآنَ and not ٱلْأٓنَ.

You can choose to represent it this way which is semantically correct or you can choose to follow tanzil encoding

However, I checked with tanzil.net and it doesn't encode the two words as you suggested (you have to choose Uthmani text).

In both cases it uses TATWEEL with HAMZA ABOVE instead of MADDAH ABOVE.

Correct, but the letters after the HAMZA made the difference

If I have to follow tanzil.net, the second form can't be rendered anymore with the current encoding ٱلۡـَٔانَ and will be rendered like the first image (That's probably how it should be).

Why? you can follow tanzil convention and differentiate them by the ALEF or SUPERSCRIPT ALEF after them. The first form is: TATWEEL + HAMZA ABOVE + ALEF The second form is: TATWEEL + HAMZA ABOVE + TATWEEL + SUPERSCRIPT ALEF

mustafa0x commented 1 year ago

https://nuqayah.com/digitalkhatt.html still has the same issue and it's not like tanzil. Try to copy the word from that page and apply Courier New font e.g. and see the difference.

Looks the same to me. I copied the ayaat from tanzil, so it must be the same. I have Uthmani selected in tanzil.

munzirtaha commented 1 year ago

@mustafa0x The word is written in two shapes depending on the Surah. e.g Al-Jinn:9 and Al-Baqara:71. They look different in Mushaf Al-Madina so they are encoded in two different ways so as to not look the same. Copy the two ayaat and try to see the difference.

aminanan commented 1 year ago

The second form is: TATWEEL + HAMZA ABOVE + TATWEEL + SUPERSCRIPT ALEF

Then I have to use TATWEEL + HAMZA ABOVE + TATWEEL + SUPERSCRIPT ALEF to render the second form below 232164041-9ded25e7-582f-4cfa-b05d-12bf748d727e Now it looks like the problem has gone from mis-encoding the first form above to mis-encoding the second form.

munzirtaha commented 1 year ago

Actually, you should use TATWEEL + HAMZA ABOVE + FATHA + ALEF to render this form alan

aminanan commented 1 year ago

I looked a bit why we can't encode this form

alan

with HAMZA (U+0621) + FATHA + ALEF and use TATWEEL + HAMZA ABOVE + FATHA + ALEF to the other form below (even this is not the correct form) .

232164041-9ded25e7-582f-4cfa-b05d-12bf748d727e

The reason is that HAMZA is non-joining and should break two joining letters (i.e. الءان).

This is an old issue proposing to change property of HAMZA to be chairless (inline, amphibious) or add a new chairless HAMZA character, but nothing has been done so far. So the convention seems to use TATWEEL + HAMZA ABOVE to encode a chairless HAMZA.

However, I still wonder why in the case of the Arabic language the HAMAZA was not represented as chairless (since it is always the case for the Arabic language) and therefore we do not have to use the combination TATWEEL + HAMZA ABOVE?

Anyway, for now I'll change the encoding as requested (and sacrifice the second form) using the TATWEEL + HAMZA ABOVE convention like other Quran texts.

munzirtaha commented 1 year ago

I wonder why not just encoding it as الآن? It's the normal way of writing it and it's not used elsewhere in Mushaf and any other font will fallback to a correct form. You can even add a style set later to switch between the two forms if someone wants to use your font for non-Quranic text.

munzirtaha commented 1 year ago

Thanks so much for the fix, but you missed fixing the text in src/qurantext/quran.cpp

aminanan commented 1 year ago

I did not push the changes.. Done now.