Closed freqdec closed 11 years ago
Hi Brian,
thanks for the nice feedback, great to hear. I've already replaced line 66 (don't know what went through my head there), but I can't quite figure out the correct RegEx to get rid of the punctuation.
Which characters would be affected, anyway?
And don't worry about being pedantic, it's great to have another eye on the code to spot the little things. :)
Hi Sacha,
Thinking further, the punctuation regExp will have to change according to language i.e. the regExp for the Spanish language will contain characters not necessary in the English language (inverted question mark for example).
It may be possible to create an uber regExp that covers most languages but you will never keep everyone happy! Here's a most terrible attempt at something that might work:
/['";:,.\/?¿-!¡]/g
Good Luck!
If you want to remove punctuation, I think a better regex would be something like str.replace(/[^A-Za-z0-9 ]/g, ''). That should remove anything that's not a space, number, or letter.
On the other hand, I don't think that this is something that would be desirable. It's not very intuitive, and it makes it so that count.js output doesn't match Microsoft Word's count, which would probably be the standard you'd want to follow.
Hi Will, your regExp will fail dramattically on any language that has accented characters.
Not entirely sure, but I think it would only matter if a character is preceded by a space (e.g. question mark or exclamation point in French). Likte that, wouldn't it actually be save to just remove those characters (plus the space), wherever needed?
You are right! So this might work - looks for a space before a punctuation character and replaces them both...
.replace(/\s['";:,.\/?¿-!¡]/g, '').split(/\s).length
Ah yeah, don't know what I was thinking really. Still, I think the Microsoft Word question is valid. A solitary punctuation is also treated as a word by wc. I don't think getting different results from both of those is a good idea.
Just tested how some other tools treat this situation. Google Docs, Drafts for iOS and iA Writer all ignore the punctuation and count your example as three words. I think it would be better to follow the lead of more recent projects like the aforementioned. I'll look into it later.
Well, that makes sense. I guess it's up to you haha.
Hi, great script!
It would be great if text like "Bonjour !" wasn't counted as two words. You would have to pass the string through a regExp that removed common punctuation characters before splitting into words.
Also, line 66 can be rewritten without the .split i.e. from this:
characters: str ? str.replace(/\s/g, '').split('').length : 0
to this:
characters: str ? str.replace(/\s/g, '').length : 0
Again, great script - apologies for me being pedantic about details like this!