Closed kapote2017 closed 6 years ago
Hey, @kapote2017 I'm very sorry to hear that, this may be related to #95 Maybe @davidlday can help with that.
@kapote2017 I would also appreciate pull requests, in case you want to get your feet wet with atom-wordcount
:)
@kapote2017 - Can you post sample text along with expected word count? I don't speak / read Armenian, but I'll be happy to do my best to help out.
@kapote2017 - Any chance you can post sample text that's not getting counted correctly along with an expected word count?
Oh, sorry, I missed your message.
Here is the simple text:
Լինում է, չի լինում մի խեղճ մարդանունը Նազար: Էս Նազարը մի անշնորհք ու ալարկոտ մարդ է լինում: Էնքան էլ վախկոտ, էնքան էլ վախկոտ, որ մենակ ոտը ոտի առաջ չէր դնիլ, թեկուզ սպանեիր: Օրը մինչև իրիկուն կնկա կողքը կտրած
նրա հետ էր դուրս գնալիս դուրս էր գնում, տուն գալիս` տուն գալի: Դրա համար էլ անունը դնում են վախկոտ Նազար:
There is 59 words.
For future reference: Official Unicode Consortium code chart.
@kapote2017 - Can you confirm you're getting a 0 word count? If so, should be an easy enough fix.
Also, I'm getting 58 words when I add the Armenian alphabet to the regex. Can you also confirm which is correct: 58 or 59?
Yes, I've get 0 words. And there is 58 words, you're right :) Algorithms count better than me. Fact!
@OleMchls - I have a fix, but I have a few questions before submitting a PR:
Thanks!
- The word-regex will no longer work as it doesn't account for Armenian, but I'm okay with helping maintain the regex from here on out. Are you okay with this?
Yes, I appreciate your care about this project thus I would like to invite you as an official collaborator to this project.
- Since the regex will now be specific to this package, are you okay if I add in apostrophes (') to enforce contractions being a single word, or do you think that needs more discussion?
I'd be fine with this, your call.
- Can we add a label for issues to track language-specific problems?
Sure!
@OleMchls - thank you! Changes were pretty minimal, so I'll start a branch here for the PR instead of using the fork.
@kapote2017 - Are there such things as contractions in Armenian (i.e. don't = do not)? And if so, would you consider a contraction one word or two?
@davidlday No, there is no contractions in Armenian, but we use apostrophe to write some names (e.g. Joan of Arc = Ժաննա դ'Արկ). We count this "դ'Արկ" as one word.
Previous versions does not have this problem.