OleMchls / atom-wordcount

Counts the words in your current document
https://atom.io/packages/wordcount
MIT License
38 stars 27 forks source link

Latest release does not count Armenian language words #100

Closed kapote2017 closed 6 years ago

kapote2017 commented 6 years ago

Previous versions does not have this problem.

OleMchls commented 6 years ago

Hey, @kapote2017 I'm very sorry to hear that, this may be related to #95 Maybe @davidlday can help with that.

@kapote2017 I would also appreciate pull requests, in case you want to get your feet wet with atom-wordcount :)

davidlday commented 6 years ago

@kapote2017 - Can you post sample text along with expected word count? I don't speak / read Armenian, but I'll be happy to do my best to help out.

davidlday commented 6 years ago

@kapote2017 - Any chance you can post sample text that's not getting counted correctly along with an expected word count?

kapote2017 commented 6 years ago

Oh, sorry, I missed your message.

Here is the simple text:

Լինում է, չի լինում մի խեղճ մարդանունը Նազար: Էս Նազարը մի անշնորհք ու ալարկոտ մարդ է լինում: Էնքան էլ վախկոտ, էնքան էլ վախկոտ, որ մենակ ոտը ոտի առաջ չէր դնիլ, թեկուզ սպանեիր: Օրը մինչև իրիկուն կնկա կողքը կտրած նրա հետ էր դուրս գնալիս դուրս էր գնում, տուն գալիս` տուն գալի: Դրա համար էլ անունը դնում են վախկոտ Նազար:

There is 59 words.

davidlday commented 6 years ago

For future reference: Official Unicode Consortium code chart.

davidlday commented 6 years ago

@kapote2017 - Can you confirm you're getting a 0 word count? If so, should be an easy enough fix.

Also, I'm getting 58 words when I add the Armenian alphabet to the regex. Can you also confirm which is correct: 58 or 59?

kapote2017 commented 6 years ago

Yes, I've get 0 words. And there is 58 words, you're right :) Algorithms count better than me. Fact!

davidlday commented 6 years ago

@OleMchls - I have a fix, but I have a few questions before submitting a PR:

Thanks!

OleMchls commented 6 years ago
  • The word-regex will no longer work as it doesn't account for Armenian, but I'm okay with helping maintain the regex from here on out. Are you okay with this?

Yes, I appreciate your care about this project thus I would like to invite you as an official collaborator to this project.

  • Since the regex will now be specific to this package, are you okay if I add in apostrophes (') to enforce contractions being a single word, or do you think that needs more discussion?

I'd be fine with this, your call.

  • Can we add a label for issues to track language-specific problems?

Sure!

davidlday commented 6 years ago

@OleMchls - thank you! Changes were pretty minimal, so I'll start a branch here for the PR instead of using the fork.

@kapote2017 - Are there such things as contractions in Armenian (i.e. don't = do not)? And if so, would you consider a contraction one word or two?

kapote2017 commented 6 years ago

@davidlday No, there is no contractions in Armenian, but we use apostrophe to write some names (e.g. Joan of Arc = Ժաննա դ'Արկ). We count this "դ'Արկ" as one word.