TryGhost / Ghost

Independent technology for modern publishing, memberships, subscriptions and newsletters.
https://ghost.org
MIT License
47.46k stars 10.35k forks source link

Word count doesn't support Devanagari script #11599

Closed amanmehara closed 3 years ago

amanmehara commented 4 years ago

Issue Summary

Words written in Devanagari script are not recognized by the editor. snippet

To Reproduce

  1. Write few words in Devanagari script in the editor.
  2. Number of words in the bottom right corner will not increase.

This might lead to incorrect time to read.

Technical details:

lunaticmonk commented 4 years ago

Looks like the code here: https://github.com/sparksuite/simplemde-markdown-editor/blob/6abda7ab68cc20f4aca870eb243747951b90ab04/src/js/simplemde.js#L1054-L1067 is being used to count words. And it is not taking devnagari in consideration.

QbDesu commented 4 years ago

Taking it a look at the project it might also be worth considering changing from simplemde to EasyMDE seeing as SimpleMDE hasn't been updated in the past 3 years and EasyMDE is actively being maintained (and also 350 commits ahead of SimpleMDE).

sumukshashidhar commented 4 years ago

I second @Necr0 here, the code linked by @lunaticmonk looks like its just counting some regular unicode characters.

EasyMDE is definitely a better option

ErisDS commented 4 years ago

This is the same issue as https://github.com/TryGhost/Ghost/issues/10303 and https://github.com/TryGhost/Ghost/issues/8467.

Core team has no plan to fix it. Contributions are welcome but will likely be required upstream

SimonVillage commented 4 years ago

So I was checking regarding Thai as we have the same problem. It seems like the only thing which will work is a dictionary for Thai but I don't think it's a solution ghost would consider?

Example could be usage of: https://github.com/veer66/wordcut Which leads to

echo 'จะถูกเปิดตัว' | wordcut                        
จะ ถูก เปิด ตัว

But that is a solution only working for Thai and I don't think that there are any solutions out there which work for all languages anyways.

What could be the right approach here?

Edit: Is https://github.com/sparksuite/simplemde-markdown-editor/blob/6abda7ab68cc20f4aca870eb243747951b90ab04/src/js/simplemde.js#L1054-L1067 also used to calculate the words of an article in general? To get the "x minutes to read"? Or is this another function?

arjunraj1523 commented 4 years ago

In Devanagari the words are separated by space so it should be easy to solve. Also the end of a sentence is a “|” instead of a “.”. I’d like to take this issue up

ErisDS commented 4 years ago

The reading time code is based on the simpleMDE code, but is implemented with customisations here: https://github.com/TryGhost/Ghost-SDK/blob/master/packages/helpers/lib/utils/count-words.js

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

SimonVillage commented 4 years ago

Dear Mr stale, please keep this issue open so we can have a fix for this in the future.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.