ONLYOFFICE / DocumentServer

ONLYOFFICE Docs is a free collaborative online office suite comprising viewers and editors for texts, spreadsheets and presentations, forms and PDF, fully compatible with Office Open XML formats: .docx, .xlsx, .pptx and enabling collaborative editing in real time.
https://www.onlyoffice.com
GNU Affero General Public License v3.0
4.89k stars 1.09k forks source link

Spell checker mark contractions and omissions (aren't, we're) as errors #576

Closed tony4212 closed 3 weeks ago

tony4212 commented 5 years ago

Under English USA language when typing anything with "n't" such as doesn't, didn't, aren't... I get a spell checking red mark. The correction choice it gives me is does n, did n, are n.... As far as I can tell all other contractions appear to be fine.

alexanderonlyoffice commented 5 years ago

Hello @tony4212, please specify the following information:

  1. OS of the server, where ONLYOFFICE Document Server is installed;
  2. Its installation type (docker, deb/rpm or Windows installation);
  3. Currently installed version of Document Server;
  4. Please send us also a screenshot illustrating the issue.
Rita-Bubnova commented 5 years ago

Hello, @tony4212. Thank you. I can confirm - this is a bug, issue 41724 in our internal issue tracker.

mcree commented 5 years ago

I can confirm that this (or similar) behaviour is also experienced with other languages, such as Hungarian.

Common words like: "szolgáltatás", "üzemeltető", "kutatók", "elvárások", "bővítés" are al triggering spell checking errors, while they are happily accepted by other software based on the same hunspell engine / dictionaries.

The behaviour is the same in desktop and online edition of ONLYOFFICE. Tested on version 5.4.1 on Ubuntu 18.04 based desktop editor and dockerized server releases.

jdaviescoates commented 4 years ago

As per my now closed duplicate (I hadn't searched for contraction), it's not just n't contractions either. The same thing happens with e.g. we've

tony4212 commented 4 years ago

Tested this on onlyoffice document editor version 5.6.5. Check both English (United States), (Canada), and (United Kingdom). All have this issue. Used Wikipedia:List_of_English_contractions. It should be noted that some of those contractions in the list should be marked incorrect anyway since they are only used colloquially.

Rita-Bubnova commented 4 years ago

As per my now close duplicate (I hadn't searched for contraction), it's not just n't contractions either. The same thing happens with e.g. we've

Thank you for your report, for us cases like doesn't and we've are the same and currently not implemented (issues 41724 (for Document Editor) and 46815 (for Spreadsheet Editor) in our private issue tracker).

jdaviescoates commented 4 years ago

@Rita-Bubnova yes, they are the same, so good to know. I only mentioned it because the OP just refers to "when typing anything with "n't" "

ShockwaveNN commented 4 years ago

@jdaviescoates Yeah, I changed title of issue to be more clear

solarjoe commented 4 years ago

I have the same issue (Desktop Editors Version 5.6.4.20) with German spell checking. Bold means underlined.

"Softwareent- und abwicklung" is highlighted, which points to an issue with hyphenation. "Es gibt mögliche Anwendungen ..." is detected correctly, but "Mögliche Anwendungen sind ..." is detected as incorrect. Could be an issue with context detection or capitalization. But I think it is also related to German "Umlaut" (ä, ü, ö).

Rita-Bubnova commented 4 years ago

@solarjoe, I see several questions (or problems) in your message:

  1. Word "Softwareent" is not in the dictionary (as in MS Word). Is this behavior not correct?
  2. Word "abwicklung" is proposed to be corrected to Abwicklung (as in MS Word). Is this behavior not correct?
  3. We know about problem with word "Mögliche" - issue 43318 in our private issue tracker (discussed in https://github.com/ONLYOFFICE/DesktopEditors/issues/223#issuecomment-545472551).

The problems you describe are not related to the original problem. Could you create new issues?

solarjoe commented 4 years ago

Thank for the reply, I will add this to the other thread.

1.) It's a correct hyphenation and abbreviation. Instead of "Softwareentwickung und Softwareabwicklung" you can write "Softwareent- und abwicklung". Save a "Software" and a "wicklung" :) . But you can also write "Softwareentwickung und -abwicklung", save-split at a different position, save one "Software". Basically you can choose how much to save and what sounds better.

2.) See above. It's correct in that case.

3.) Will do.

DylwinTFTW commented 3 years ago

The bug is still active. This needs to be fixed immediately.

jdaviescoates commented 3 years ago

Yes, I find it somewhat concerning that a "simple" matter of getting spell check to work on these very common contractions has not been resolved in over 2 years!

ShockwaveNN commented 3 years ago

This bug is active because our dev team thinks it's not very easy bug to fix without any side-effect

This needs to be fixed immediately.

Please note that if you using paid license - you can contact support@onlyoffice.com and discuss importance of this issue for your team

But if you're using a free license - I don't like your tone, because we providing free products for you and if you don't like this problem - feel free to edit the source code and fix it yourself

"simple" matter of getting spell check

This may be simple on outside, but in real life - no all simple errors on first glance are easily resolved, especially they can break some other pats of spellchecker

I'll remind our dev team to try to fix this bug as soon as they be able

jdaviescoates commented 3 years ago

This may be simple on outside, but in real life - no all simple errors on first glance are easily resolved, especially they can break some other pats of spellchecker

Yeah that's why I put "simple" in quote marks. Nothing to do with software is ever really simple. But I still think it's fair for people these days to expect standard "simple" features like spell checking to work properly.

I'll remind our dev team to try to fix this bug as soon as they be able

Thank you

David-OConnor commented 3 years ago

Reporting the same thing. I think this is the sort of error that will turn people off; Most (all?) people typing in English will notice it immediately. It being tough to solve implies infrastructure problems that go deep into the codebase.

Mer0me commented 2 years ago

Same behaviour here with french dictionnary

ONLYOFFICE Docs™ Enterprise Edition version 6.4.2 Linux 4.19.0-17-amd64 ONLYOFFICE/onlyoffice-nextcloud#1 SMP Debian 4.19.194-3 (2021-07-18) x86_64 Nextcloud Hub II (23.0.0) OnlyOffice app v7.2.1

Client : Windows 10 or Xubuntu / Firefox 94

image

image

ShockwaveNN commented 2 years ago

@Mer0me Thanks, added aujourd'hui to our bug

mr-intj commented 2 years ago

This bug is active because our dev team thinks it's not very easy bug to fix without any side-effect

I suspected as much. We have similar "simple" issues in our product that involve some fundamental change that sends ripples all throughout the code. At that point it becomes a "we'll tackle this after all this other stuff is done" low-priority task. :-)

One odd thing I noticed about this particular issue is that it doesn't affect all English contractions equally, e.g.:

OK: can't, won't, haven't, couldn't, they'd Not OK: didn't, hadn't, wasn't, doesn't, aren't

ema1596 commented 1 year ago

I don't pretend to know how complicated an issue this is to fix, but I'm pretty sure four years is plenty of time to fix any bug. Is there a timeline for this to get fixed by any chance?

Smartich0ke commented 1 year ago

I'm also having this problem. For me, I'm using English Australia.

I'm running Documentserver v7.2.2.56 in Docker on Debian 11 host.

I understand that although these may appear as simple problems, they may be much more complicated to fix underneath, but surely after several years of the bug being submitted this would've been resolved by now?

MochaMoth commented 1 year ago

I'm not a fan of how long this issue has been present in this tracker, and I'm also not a fan of how the development team is treating its customers.

But if you're using a free license - I don't like your tone, because we providing free products for you and if you don't like this problem - feel free to edit the source code and fix it yourself

I want to point out that the product team thinks this product is enterprise-ready.

image

I would consider a properly working spell-checker to be part of the MVP for any office productivity software, free or otherwise. I see 2 possible scenarios here;

  1. This product is still in alpha or some variant of pre-release and customers with a free license are testing your software to prepare your team for monetization in which case their feedback should be taken seriously, or
  2. Your product has a customer-impacting bug that should be highly prioritized because it is affecting every customer in some countries.

My product team would be on my tail day and night if a bug this big survived in production for longer than a week, let alone four years.

"simple" matter of getting spell check

This may be simple on outside, but in real life - no all simple errors on first glance are easily resolved, especially they can break some other pats of spellchecker

This may not have been a simple fix immediately, however if this had prioritized as a high-severity customer impacting issue (which it is), then the impact should have been mitigated in the short term and a long term plan to refactor the system to better meet the business needs would likely have been finished at this point.

I had picked up OnlyOffice to find out if it was the right solution for our business however the lack of customer care is preventing me from giving my recommendation. This bug will present itself immediately when anyone tries using the software for the first time and has likely turned many would-be customers away from the product.

Fmstrat commented 1 year ago

But if you're using a free license - I don't like your tone, because we providing free products for you and if you don't like this problem - feel free to edit the source code and fix it yourself

Went through the usual: "FOSS is great, OnlyOffice is just as good as O365" before setting up with NextCloud. A user points this bug out to me 300 words into their first document. How is this still around over 4 years later?

I understand your point, but items like these make it hard for us advocates to justify pushing a minimum 50 person $1500 support contract when O365 is as cheap as it is.

cpot commented 1 year ago

Microsoft 365 Personnal costs 70€/year. Office 365 F1 true is cheap (2.10€/usr/month).

OnlyOffice starts to cost something over 20 opened simultaneous doc (as a rule of thumb you can use 1 to 5 or 10 ratio, which means Onlyoffice could be free for 100 to 200 users. 1500$/12/100 = 1.25$/usr/month

True that Onlyoffice sometimes take a lot of time to do some bug fixes and features implementation (and our company also).

But remember Microsoft has a huge financial firepower (an highly suspicious 42% operating margin) and when increasing his price, it is each time +30%.

Opensource ecosystem is perhaps not perfect but we give the users at least an alternative to huge monopolies. We also try to keep technologies open and give a better user experience (office is rather good but other Microsoft products are a nightmare like Sharepoint or even Outlook). And on the security side it is far from perfect check storm-0558 Exchange online governmental data leak.

Smartich0ke commented 1 year ago

Yes, as @MochaMoth has rightly noted, there appears to be a contradiction in the messaging from the onlyoffice team. On one hand, you state that your product is community-driven and that fixing bugs like these takes time, which is understandable. On the other hand, your product team labels this software as "enterprise-ready."

A functioning spell checker is a fundamental feature in any desktop publishing software. It's surprising to me that an issue like this—one affecting basic text accuracy—isn't treated as a top-priority bug. What's even more surprising is that this software has been marketed and sold to paying customers as "enterprise-ready" while lacking a fully operational spell checker.

Given that this particular bug has persisted for over four years, it raises questions for me about the software's readiness for enterprise use.

emes81 commented 1 year ago

I'm not currently an enterprise customer, but I operate a Nextcloud instance for an SME and could become one. I'm currently testing out OO on a personal level with a view to implementing it in our organisation.

If anyone from the development team is active on this thread - do you have any information of the difficult of fixing this/a timeframe on the fix?

For me, it currently marks "aren't" as an error, but "I'm", "won't" are not marked as such.

mr-intj commented 1 year ago

This bug is active because our dev team thinks it's not very easy bug to fix without any side-effect

Is there a write-up from your dev team that discusses the underlying issue(s), and any relevant complications or considerations?

One (or a few) of us might be willing to take a stab at fixing this if the effort required isn't too large. Judging the size of the effort would be quicker and simpler if we had a rundown of the issue and possible side-effects.

Fmstrat commented 1 year ago

For me, it currently marks "aren't" as an error, but "I'm", "won't" are not marked as such.

@emes81 That's because I and won are words. It's treating the ' as a space (theory).

OnlyOffice starts to cost something over 20 opened simultaneous doc (as a rule of thumb you can use 1 to 5 or 10 ratio, which means Onlyoffice could be free for 100 to 200 users. 1500$/12/100 = 1.25$/usr/month

@cpot I think you may have missed the point I was trying to make. A minimum support contract is $1,500 per year. And a user response of "This is a document editor, and I can't trust it's spell checker" has turned off (at least) one company that was considering it. This is an absolutely basic document issue that degraded the trust of those who don't advocate for it.

To note: I will continue to use OnlyOffice and support it where I can, but this issue being 4 years old, and so... basic (to an editors functionality), just makes me shake my head and degrades any ability I have to push others.

MochaMoth commented 1 year ago

Something that concerns me is that spell check is a solved technology. Build a node graph of all words in a given language, populate it in memory at application load time, pass a small segment of localized changes in the document through it to get misspelled words. If words are showing up incorrectly, simply add them to the language map.

I looked through the source a bit to see if I could locate the current implementation of the spell checker, however I haven't been able to spend enough time on it as I haven't found it. I agree with @mr-intj, if the dev team has a write up of the current implementation and where to look, someone could look into fixing this for you, though my first instinct is to rip out whatever the current spell check is and replace it with a basic one like I've mentioned.

Fmstrat commented 1 year ago

I looked through the source a bit to see if I could locate the current implementation of the spell checker, however I haven't been able to spend enough time on it as I haven't found it. I agree with @mr-intj, if the dev team has a write up of the current implementation and where to look, someone could look into fixing this for you, though my first instinct is to rip out whatever the current spell check is and replace it with a basic one like I've mentioned.

Quick search https://github.com/search?q=org%3AONLYOFFICE+spell&type=code looks like they use hunspell, but there's some work for aspell with a PR of https://github.com/ONLYOFFICE/DocumentServer/pull/462

My guess is you limited your search to this repo ;)

EDIT: It's possible this may impact more than OO: https://bugs.launchpad.net/ubuntu/+source/scowl/+bug/1807103

EDIT2: And appears to be addressed by hunspell in 2014: https://github.com/hunspell/hunspell/blob/84e3fbba567444d8ed460b894b864db45ad8852a/ChangeLog#L90

lsdebianchi commented 6 months ago

This issue is still present.

I can add that is very evident in Italian, where all vowels at the end of articles, pronouns and preposition should be contracted if the following word also starts with a vowel.

So questa amica would be written quest'amica. This is know in Italian as elisione.

@emes81 That's because I and won are words. It's treating the ' as a space (theory).

I think so. I see that single letter are never considered a wrong-spelling, so some contractions like l'amica will be accepted. And as mentioned before, in English you also have: can't, won't, haven't, couldn't, they'd which they are all valid_word + single_letter. A very similar pattern happen with the Italian spell-checking.

I totally understand that if the dictionary is now doing only a static word-check, then adding a dynamic contractions system which should apply only in some case depending on the grammar of the current language is no small task.

I also understand that users will have a strong emotional reaction when seeing some spell-checking fails on some common grammar of their own language. This is quite a visible bug which may scare people away from an otherwise amazing suite of products. This would be my main reason to prioritize the feature. But in my book you are still beating Pages and OpenOffice to shreds. So... there is that.

My strategy as a user will be adding all my contractions in the dictionary. This will weaken the efficacy of my spell-checking experience but it will be the current compromise.

Anyway, thoughts and prayer for this bug.

MochaMoth commented 5 months ago

My solution to this issue (and many others in Only Office) was to move to a Collabora server and use it instead. It works out of the box and runs in next cloud just as easily (with docker containers and reverse proxy). It even imports Only Office documents, though some formatting fixes may be necessary. Highly recommend as it's actually production ready.

KevynValladares21 commented 1 month ago

They partially fixed the issue. Some contractions don't trigger spell check while others do. "Can't" is valid but "aren't" is invalid for some reason.

emes81 commented 1 month ago

They partially fixed the issue. Some contractions don't trigger spell check while others do. "Can't" is valid but "aren't" is invalid for some reason.

As someone else pointed out, this may not be a fix. "can" is a word in English, "aren" is not.

It's concerning that this issue is still present. It's been five years.

SpaceCoyot3 commented 1 month ago

Hoping this gets fixed soon. More than 5 years is a long time to fix a fundamental spellchecking error.

ElenaMaaya commented 1 month ago

Hello! I have a good news! The issue 41724 will be fixed in the next release (commit). Please keep checking for updates.

Rita-Bubnova commented 3 weeks ago

DocumentServer v8.2.0 is released so I close this issue. Feel free to comment or reopen it if you got further questions.