BesutKode / uni-task-1

Creative Commons Attribution 4.0 International
5 stars 0 forks source link

edawine #12

Closed ghost closed 8 years ago

ghost commented 8 years ago

https://github.com/BesutKode/uni-task-1/wiki/edawine

jayvdb commented 8 years ago

Which bug did you find with not_french.py ? Please add it to the wiki page.

ghost commented 8 years ago

@jayvdb I have found that the French translation file contains Czech language, you can see the result in my wiki now.

jayvdb commented 8 years ago

@edawine , in general, this is already detected using languagetool. e.g. the java command line app:

$ java -jar languagetool-commandline.jar -e HUNSPELL_NO_SUGGEST_RULE -eo --xmlfilter -l fr translations/copyq_fr.ts | grep --context=4 uzavřen
Expected text language: French
Working on translations/copyq_fr.ts...
                                                ^^^^^^^^                                             

10938.) Line 5524, column 13, Rule ID: HUNSPELL_NO_SUGGEST_RULE
Message: Faute de frappe possible trouvée
... the CopyQ Configuration dialog!         Pro modifikaci nastavení musí být prvně uzavřen dialog nast...
                                                ^^^^^^^^^^                                             

10939.) Line 5524, column 24, Rule ID: HUNSPELL_NO_SUGGEST_RULE
Message: Faute de frappe possible trouvée
...Configuration dialog!         Pro modifikaci nastavení musí být prvně uzavřen dialog nastavení Copy...
                                                ^^^^^^^^^                                             

10940.) Line 5524, column 34, Rule ID: HUNSPELL_NO_SUGGEST_RULE
Message: Faute de frappe possible trouvée
...ion dialog!         Pro modifikaci nastavení musí být prvně uzavřen dialog nastavení CopyQ!   ...
                                                ^^^^                                             

10941.) Line 5524, column 39, Rule ID: HUNSPELL_NO_SUGGEST_RULE
Message: Faute de frappe possible trouvée
...ialog!         Pro modifikaci nastavení musí být prvně uzavřen dialog nastavení CopyQ!       ...
                                                ^^^                                             

10942.) Line 5524, column 43, Rule ID: HUNSPELL_NO_SUGGEST_RULE
Message: Faute de frappe possible trouvée
...g!         Pro modifikaci nastavení musí být prvně uzavřen dialog nastavení CopyQ!             ...
                                                ^^^^^                                             

10943.) Line 5524, column 49, Rule ID: HUNSPELL_NO_SUGGEST_RULE
Message: Faute de frappe possible trouvée
...     Pro modifikaci nastavení musí být prvně uzavřen dialog nastavení CopyQ!                   In...
                                                ^^^^^^^                                             

10944.) Line 5524, column 57, Rule ID: HUNSPELL_NO_SUGGEST_RULE
Message: Faute de frappe possible trouvée
... modifikaci nastavení musí být prvně uzavřen dialog nastavení CopyQ!                   Invalid o...
                                                ^^^^^^                                             

10945.) Line 5524, column 64, Rule ID: HUNSPELL_NO_SUGGEST_RULE
Message: Faute de frappe possible trouvée
...kaci nastavení musí být prvně uzavřen dialog nastavení CopyQ!                   Invalid option!    ...
                                                ^^^^^^^^^                                             

10946.) Line 5524, column 74, Rule ID: HUNSPELL_NO_SUGGEST_RULE
Message: Faute de frappe possible trouvée
...vení musí být prvně uzavřen dialog nastavení CopyQ!                   Invalid option!         P...
                                                ^^^^^                                             

10947.) Line 5524, column 74, Rule ID: FRENCH_WHITESPACE
Message: Point d'exclamation est précédé d'une espace fine insécable.
Suggestion: CopyQ !
...vení musí být prvně uzavřen dialog nastavení CopyQ!                   Invalid option!         Pa...
                                                ^^^^^^                                             

10948.) Line 5525, column 1, Rule ID: WHITESPACE_RULE
Message: Faute de frappe possible : vous avez répété une espace
Suggestion:  
...sí být prvně uzavřen dialog nastavení CopyQ!                   Invalid option!         Paramètre inc...
                                                ^^^^^^^^^^                                             

10949.) Line 5527, column 9, Rule ID: HUNSPELL_NO_SUGGEST_RULE
Message: Faute de frappe possible trouvée

If you can find a message that your tool detects and isn't detected by languagetool as invalid French, re-assign this issue back to me.

However, the approach of your tool is different to languagetool's rule, is extremely simple, and it has a low false positive rate.

Note that there are already language detection tools for many languages e.g. https://pypi.python.org/pypi?%3Aaction=search&term=language+detect&submit=search . Your algorithm is probably already in one of them, or can easily be added to one of them.

Taking your concept further, I'll accept your tool if you can enhance this concept so that it shows a better output than languagetool. e.g. if you can order the results by the percentage of the message that is not French words, then the user can quickly process the highest percentages, and then take more time with the lower percentages.

ghost commented 8 years ago

@jayvdb I have updated my script, please check again.

jayvdb commented 8 years ago

It is definitely a WIP, and there are already line-based language detection tools, but for this specific problem, the new tool does a good job of helping someone without knowledge of French process the 'easy' problems, so an native/expert at French can work on the remainder. :+1: