bodonlp / bodo-tokenizer

Tokenizer for Bodo language
MIT License
0 stars 2 forks source link

Extra space #1

Open maharajbrahma opened 2 years ago

maharajbrahma commented 2 years ago

After tokenizing extra space after Dari mark. The space should not exists after Dari marker.

tokenize("मख’जाथाव राव गोनोखोआरि आरो रोंगौसाफोरजों दाजानाय मोनसे हारिमायारि बोसोनगिरि आफादजों।")
"मख'जाथाव राव गोनोखोआरि आरो रोंगौसाफोरजों दाजानाय मोनसे हारिमायारि बोसोनगिरि आफादजों । "
maharajbrahma commented 2 years ago

Another problem that occurs when more than one punctuation marks are one after another.

For e.g. । followed by ' and another such sequences.

Both the issues have been fixed in the commit e36ffa0

sanjibnarzary commented 1 year ago

Nice