Parsing Tajweed Quran - Githubissues

daliaessam commented 8 years ago

Asslamu Allikum,

First thank you all for the excellent effort on this project.

Second, I am trying to use the Quran Tajweed direct without the JS library, I used the text database file and inserted it into a mysql database no problem.

I am able to parse the Tajweed tags based on the JS code GlobalQuran.js:

parseTajweed: function(a, b) { return b.replace(/\[h/g, '<span class="ham_wasl" title="Hamzat Wasl" alt="').replace(/\[s/g, '<span class="slnt" title="Silent" alt="').replace(/\[l/g, '<span class="slnt" title="Lam Shamsiyyah" alt="').replace(/\[n/g, '<span class="madda_normal" title="Normal Prolongation: 2 Vowels" alt="').replace(/\[p/g, '<span class="madda_permissible" title="Permissible Prolongation: 2, 4, 6 Vowels" alt="').replace(/\[m/g, '<span class="madda_necessary" title="Necessary Prolongation: 6 Vowels" alt="').replace(/\[q/g, '<span class="qlq" title="Qalqalah" alt="').replace(/\[o/g, '<span class="madda_obligatory" title="Obligatory Prolongation: 4-5 Vowels" alt="').replace(/\[c/g, '<span class="ikhf_shfw" title="Ikhfa\' Shafawi - With Meem" alt="').replace(/\[f/g, '<span class="ikhf" title="Ikhfa\'" alt="').replace(/\[w/g, '<span class="idghm_shfw" title="Idgham Shafawi - With Meem" alt="').replace(/\[i/g, '<span class="iqlb" title="Iqlab" alt="').replace(/\[a/g, '<span class="idgh_ghn" title="Idgham - With Ghunnah" alt="').replace(/\[u/g, '<span class="idgh_w_ghn" title="Idgham - Without Ghunnah" alt="').replace(/\[d/g, '<span class="idgh_mus" title="Idgham - Mutajanisayn" alt="').replace(/\[b/g, '<span class="idgh_mus" title="Idgham - Mutaqaribayn" alt="').replace(/\[g/g, '<span class="ghn" title="Ghunnah: 2 Vowels" alt="').replace(/\[/g, '" >').replace(/\]/g, "</span>") },

This works very nice on Firefox browsers. On Chrome and WebKit based browser the letters are not joined and each letter of tajweed letters looks as a standalone separate letter. This is because the span tag added in place of the tajweed tags. I know this is a Chrome/Webkit bug and I did research online and everyone is recommended using the zero width joining entity:

&zwj;

Now the issue is we need a rules on parsing the Tajweed words/tags to insert this zero width joining letter, that's decide if the letter is at the begaining of the word, end of the word, letter next to it etc.

The question does anyone have these rules to fix this.

During my search online I found a website applying this but seems server side:

http://www.mosshaf.com/ar/main

Tajweed version is very important to include this zero width joining tag built in instead of parsing as Chrome / Webkit browsers are widely used specially on Android and Mobile devices, over 90% now using mobiles to access websites specially for such purposes reading quran and learning.

iBasit commented 8 years ago

Wa salaamu waliakum wr wb

That needs to be added at the time of joining the letters, so it has to be outside the api.

We will insha'Allah have tajweed svg version, that will make things easier at the moment, I'm sorry no one can help here.

Following are some links, that might help.

http://stackoverflow.com/questions/33095097/separated-arabic-font-using-html-color-tag-in-webview-and-text-view/33604599#33604599

http://stackoverflow.com/questions/11155849/partially-colored-arabic-word-in-html

daliaessam commented 8 years ago

The SVG version will not be very useful as the text version which is much easier to manage and process as a small database instead of having tons of images files with limited functionality. I think it is better to add only the zero width joining letters in the default text database file or some flags that mark the letters that should not be joined.

I emailed the admin of the website http://www.mosshaf.com/ar/main about his parsing code or fix for this issue, if he replied with anything useful I will post it here. I hope this suggestion is considered as a new feature added to the free database.

Another request would be a great if you include a text help file with the structure of the quran database files and tags and how should be the parsing instead I have to search for the code files to know the fields etc.

جزاكم الله خيراً جميعاً

daliaessam commented 8 years ago

I found a code lib on this site called lab-master contains a JS file called tajweedTools.js which contains a function code below:

` buck2tajweed: function (surah, ayah, buck) { //return gq.quran.parse.buck2arabic(buck); var tajweed = {1: 'silent', 2: 'ghunnah', 3: 'np', 4: 'pp', 5: 'op', 6: 'np', 7: 'ep', 8: 'unrest'}; var arabic = '', l, letter, found=false, open=false, openTag;

        wordArray = buck.split(' ');
        arabic = '';
        $.each(wordArray, function(i, word) {

            arabic += '<word data-id="'+Quran.word.number(surah, ayah, i+1)+'">';
            letterArray = word.split('');

            for(l=0; l<letterArray.length; ++l)
            {
                letter = letterArray[l];
                found = false;

                if (tajweed[letter] != null)
                {
                    if (open)
                    {
                        if (letterArray > l)
                            arabic += '&zwj;';
                        arabic += '</'+openTag+'>';
                        if (letterArray > l)
                            arabic += '&zwj;';
                    }

                    openTag = tajweed[letter];
                    if (l > 0)
                        arabic += '&zwj;';
                    arabic += '<'+openTag+'>';
                    if (l > 0)
                        arabic += '&zwj;';
                    open = true;
                    continue;
                }
                else if (letter == '0')
                {
                    if (open)
                    {
                        arabic += '&zwj;</'+openTag+'>&zwj;';
                        open = false;
                    }

                    continue;
                }

                for(n=1; n<Quran._data.buck.length; ++n)
                {
                    if(letter == Quran._data.buck[n])
                    {
                        arabic += Quran._data.char[n];
                        found = true;
                        break;
                    }
                }

                if (!found)
                    arabic += letter;
            }

            if (open)
            {
                arabic += '</'+openTag+'>';
                open = false;
            }        

            arabic += '</word> ';
        });

        return arabic;
    },

`

Can someone explain this code and translate it to PHP code to work on server side to parse the tajweed text file to insert the correct joining tag & z j w; to display the tajweed quran properly on none Firefox browsers.

The complete js file in this lib is:

`jQuery(function () {

// selection info here
sel = {
    x:'',
    y:'',
    start: null,
    end: null,
    position: null,     
    length: null,
    id: null,
    verse: null,
    text: null
};

tool = {
    hover: true,
    ayahs: {}, // ayahs to update on server side

    buck2tajweed: function (surah, ayah, buck)
    {
        //return gq.quran.parse.buck2arabic(buck);
        var tajweed = {1: 'silent', 2: 'ghunnah', 3: 'np', 4: 'pp', 5: 'op', 6: 'np', 7: 'ep', 8: 'unrest'};
        var arabic = '', l, letter, found=false, open=false, openTag; 

        wordArray = buck.split(' ');
        arabic = '';
        $.each(wordArray, function(i, word) {

            arabic += '<word data-id="'+Quran.word.number(surah, ayah, i+1)+'">';
            letterArray = word.split('');

            for(l=0; l<letterArray.length; ++l)
            {
                letter = letterArray[l];
                found = false;

                if (tajweed[letter] != null)
                {
                    if (open)
                    {
                        if (letterArray > l)
                            arabic += '&zwj;';
                        arabic += '</'+openTag+'>';
                        if (letterArray > l)
                            arabic += '&zwj;';
                    }

                    openTag = tajweed[letter];
                    if (l > 0)
                        arabic += '&zwj;';
                    arabic += '<'+openTag+'>';
                    if (l > 0)
                        arabic += '&zwj;';
                    open = true;
                    continue;
                }
                else if (letter == '0')
                {
                    if (open)
                    {
                        arabic += '&zwj;</'+openTag+'>&zwj;';
                        open = false;
                    }

                    continue;
                }

                for(n=1; n<Quran._data.buck.length; ++n)
                {
                    if(letter == Quran._data.buck[n])
                    {
                        arabic += Quran._data.char[n];
                        found = true;
                        break;
                    }
                }

                if (!found)
                    arabic += letter;
            }

            if (open)
            {
                arabic += '</'+openTag+'>';
                open = false;
            }        

            arabic += '</word> ';
        });

        return arabic;
    },

    tajweed2buck: function (tajweedHtml)
    {},

    addTag: function (tajweedId)
    {           
        var tajweed = {1: 'silent', 2: 'ghunnah', 3: 'np', 4: 'pp', 5: 'op', 6: 'np', 7: 'ep', 8: 'unrest'};

        if (!tajweed[tajweedId])
            return null;

        var tag = tajweed[tajweedId];

        text = '&zwj;<'+tag+'>&zwj;'+sel.text+'&zwj;</'+tag+'>&zwj;';
        this.replaceSelection(text);

        this.ayahs[sel.verse] = sel.verse;
    },

    removeTag: function ()
    {},

    editText: function (addCode)
    {
        textArray = gq.quran.parse.arabic2buck(sel.text).split(' ');
        editedText = {};
        startId = sel.start;
        // build selected text array first
        for (i=0; i<=textArray.length; i++)
        {
            editedText[startId] = textArray[i];
            startId++;
        }

        startId = sel.start; // reset start id

        verseNo = Quran.verseNo.word(sel.start);
        verse   = Quran.ayah.fromVerse(verseNo);

        buck = gq.data.quran[gq.quran.selectedString()][verseNo];
        buckArray = buck.split(' ');            
        buckAyahText = '';

        for (i=0; i <= buckArray.length; i++)
        {
            id = Quran.word.number(verse.surah, verse.ayah, i+1);

            if (id >= sel.start && id <= sel.end)
            {
                orignalLetterCount = buckArray[i].split('').length;
                editedLetterCount  = editedText[id].split('').length;

            }
            else
            {
                buckAyahText += buckArray[i];
            }
        }
    },

    getSelection: function ()
    {
        var selection, position;

        if (window.getSelection) {
            selection = window.getSelection();

            if (selection && !selection.isCollapsed) {
                position = {
                    'offset': selection.anchorOffset,
                    'length': selection.toString().length,
                    'node': selection.anchorNode.parentNode,
                    'text': selection.toString()
                };
            }
        } else if (document.selection) {
            selection = document.selection.createRange();

            if (selection && selection.text.length) {
                var text = selection.parentElement().innerText,
                    range = document.body.createTextRange(),
                    last = 0, index = -1;

                range.moveToElementText(selection.parentElement());

                while ((index = text.indexOf(selection.text, ++index)) !== -1) {
                    range.moveStart('character', index - last);
                    last = index;

                    if (selection.offsetLeft == range.offsetLeft && selection.offsetTop == range.offsetTop) {
                        break;
                    }
                }

                position = {
                    'offset': index,
                    'length': selection.text.length,
                    'node': selection.parentElement(),
                    'text': selection.text
                };
            }
        }

        return position;
    },

    replaceSelection: function (html)
    {
        var sel, range, node;

        if (typeof window.getSelection != "undefined") {
            // IE 9 and other non-IE browsers
            sel = window.getSelection();

            // Test that the Selection object contains at least one Range
            if (sel.getRangeAt && sel.rangeCount) {
                // Get the first Range (only Firefox supports more than one)
                range = window.getSelection().getRangeAt(0);
                range.deleteContents();

                // Create a DocumentFragment to insert and populate it with HTML
                // Need to test for the existence of range.createContextualFragment
                // because it's non-standard and IE 9 does not support it
                if (range.createContextualFragment) {
                    node = range.createContextualFragment(html);
                } else {
                    // In IE 9 we need to use innerHTML of a temporary element
                    var div = document.createElement("div"), child;
                    div.innerHTML = html;
                    node = document.createDocumentFragment();
                    while ( (child = div.firstChild) ) {
                        node.appendChild(child);
                    }
                }
                range.insertNode(node);
            }
        } else if (document.selection && document.selection.type != "Control") {
            // IE 8 and below
            range = document.selection.createRange();
            range.pasteHTML(html);
        }
    }
};

$('word').live('hover', function() {

    if ((!tool.hover && !$(this).hasClass('selectedWord')) || $(this).attr('id'))
        return;

    $('#selectedWord').attr('data-id', $(this).attr('data-id'));
    $('#selectedWord').html($(this).html());

    text = $(this).html().split('').join(' ');
    $('#selectedWordLetters').attr('data-id', $(this).attr('data-id'));
    $('#selectedWordLetters').html(text);

}).live('dblclick', function() {
    if (!tool.hover && !$(this).hasClass('selectedWord'))
        ;
    else 
        tool.hover = tool.hover ? false : true;

    $('word').removeClass('selectedWord');

    if (!tool.hover)
        $(this).addClass('selectedWord');

    $(this).trigger('mouseenter');

    // Code for Deselect Text When Mouseout the Code Area
    if (window.getSelection)
    {
        if (window.getSelection().empty)
        { // Chrome
            window.getSelection().empty();
        }
        else
            if (window.getSelection().removeAllRanges)
            { // Firefox
                window.getSelection().removeAllRanges();
            }
    }
    else
        if (document.selection)
        { // IE?
            document.selection.empty();
        }
});

$('.ayah').live('mouseup', function(event) {

    selection = tool.getSelection();
    if (!selection)
    {
        sel.length = null;
        sel.position = null;
        sel.start = null;
        sel.end = null;
        sel.text = null;
        sel.id = null;
        return false;
    }
    sel.length = selection.length;
    sel.position = selection.offset;
    sel.text = selection.text;
    sel.verse = $(this).data('verse');

    sel.id = $(this).attr('id');
    sel.end = $(this).data('id');

    // if selection was left to right multi-words
    if (sel.start > sel.end)
    {
        sel.end = sel.start;
        sel.start = $(this).data('id');
        str  = $(this).text();
        sel.position = sel.position - sel.length;
        if (sel.position < 0)
            sel.position = 0;
    }
    else if (event.pageX > sel.x) // if selection was left to right single word 
    {
        //console.log('pos '+sel.position+' str '+str.split('').length);
        sel.position = sel.position - sel.length;
        if (sel.position < 0)
            sel.position = 0;
    }
    //console.log('pos '+sel.position+ ' len: '+sel.length);

    //sel.text = //getSelectedText();//(sel.id == 'selectedWordLetters') ? getSelectedText().replace(/\s+/g, '') : getSelectedText();

}).live('mousedown', function(event){
    //sel.start = $(event.target).data('id');
    sel.x = event.pageX;
    sel.y = event.pageY;
});

$('.twb').click(function() {

    id = $(this).data('id');

    if (id == 'remove')
        tool.removeTag();
    else
        tool.addTag(id);

    return false;
});

});`

meezaan commented 7 years ago

@daliaessam @iBasit I've just started to look into this.

I'm hoping to decouple the html from the actual data returned by the API create tool and guide to do the actual parsing.

There are quite a few versions of the buck and tajweed parser floating around, but zero documentation.

God willing, I will post back with something over the next few days.

meezaan commented 7 years ago

@iBasit Please excuse my ignorance, but the numbers that come back in the tajweed version along with the type identifiers, like [g:177, for instance, what to do they signify. On the website they become alt tags, but I don't see where they are actually used. What is the significance of these numbers?

Thank you.

meezaan commented 7 years ago

@daliaessam i've written something to parse this data server side and return more readable HTML using a tag.

See documentation on https://github.com/meezaan/alquran-tools/blob/master/docs/tajweed.md and example on https://github.com/meezaan/alquran-tools/blob/master/www/tajweed.php. I will clean up the code and make this into a composer package over the next few days along with links to live examples.

A thing to note - the ZWJ parser is still experimental. I manually added ZWJ where applicable, but the joins are not always right and not always pretty - worse, they actually ruin the display in firefox. You could check the user agent and then decide whether or not to use it. it still needs some work as some alphabets only require it before and some after and some both, based on the combination of letters.

daliaessam commented 7 years ago

@meezaan great effort jazak Allah Khaier. Yes Webkit is a big problem specially for Android Apps and browsers, you can not build apps now ignoring Android platform or even a web page ignoring Chrome. Until there is a solution, just a suggestion if you could include a full quran text file parased with your parser the same as the quran database files offered on those sites so if some one needed to use them he does not have to build his own version or full quran parser. Thank you again

daliaessam commented 7 years ago

@meezaan if you check this example: http://mosshaf.com/ar/main#&GetSura=103&type=tagweed&b=null

The code looks like this:

It uses alot of & zwg;

meezaan commented 7 years ago

@daliaessam Thank you, I will have a look. I can provide you with a fully parsed file, but do you want it with or without the 'zwj's? To get those to appear in the right place, I basically need an array map of the characters before and after which it applies. I'll try and prioritise that and upload a fix, God willing, as soon as possible. If you want a file without these, let me know and I can get you one.

Moshhaf.com is parsing server side, so I'm afraid we can't see how they're doing it.

meezaan commented 7 years ago

@daliaessam FYI I've made a start at creating a semi automated mapper: https://github.com/islamic-apps/alquran-tools/blob/mapper/src/AlQuranCloud/Tools/Parser/Tashkeel.php. I actually need to write the logic and test it.

If you've got time to update it, please do help. There's something to note here, though, that PHP's str_replace has some problem finding and replacing Classical Arabic characters. Once I've got this mapping ready and working, though, I will try it out. If it is still a problem, I may have to port a version into JavaScript and another server side language, either Java or Python, God willing. But heads up, that might add some unneeded delay.

daliaessam commented 7 years ago

@meezaan It will be very useful to everyone if you could complete this parser. Gazak Allah Kul AlKhaier

wanjijul commented 6 years ago

I did convert it by word(only work for uthmani) index:

"text_uthmani": "قَالُوٓا۟" "rules": ",,,,,madd_munfasil,madd_munfasil,silent,silent"

refer: https://api.tilawah.my/v1.0/verses/?surah=Al-Baqarah&ayat=1&page=1

in case someone need it in simplest form.

tested in Firefox https://www.tilawah.my/?surah=Al-Baqarah&ayat=1

GlobalQuran / data

Parsing Tajweed Quran #2