Helsinki-NLP / OPUS-CAT

OPUS-CAT is a collection of software which make it possible to OPUS-MT neural machine translation models in professional translation. OPUS-CAT includes a local offline MT engine and a collection of CAT tool plugins.
MIT License
70 stars 11 forks source link

Editing rules #45

Open mariagabv opened 2 years ago

mariagabv commented 2 years ago

Hi, I was reading the information about Edit Rules on the project's web page and I would like to try it locally, but the option does not seem to be available anymore: image Is the Edit Rules option deactivated?

TommiNieminen commented 2 years ago

Hi,

The edit rules were added in 1.2.0, you can download 1.2.0 from here. I'll add a note to the edit rule documentation that the rules are only available from 1.2.0 onwards.

-Tommi

SafeTex commented 2 years ago

Hello Tommi and all This "edit" rules thing will take a fair bit of reading but I'd like to make some comments that are "food for thought" and not a criticism at all of the remarkable work that someone has done to incorporate regex into OPUS (was it you who did this?) I worked for a long time helping the creator of TransTools (Stanislav Okhvat) to add a major component to his program that helps translators to use regex rules (with cheatsheets, a bundled libary of useful regex rules and a feature to add further rules to this library; This was due to the fact that even among translators, so few of us are competent with regex and even those who know some regex might not know how to write rules in a particular flavour (TransTools and memoQ uses NET regex like OPUS luckily) The improvements Stanislav made to TransTools and the regex tool were later used by memoQ when they ungraded their own regex tool, Some time after, I conducted an informal survey as follows:

I'm doing a short unofficial survey on the recent Regex (Regular Expressions) improvements made to memoQ (does NOT include the highlighting tool when searching in the translation pane for the purpose of this survey)

I think I've thought of all the major scenarios:

    1. You use Regex more thanks to the improvements
    1. You use Regex no more than before but with more ease thanks to the improvements
    1. You didn’t use Regex much/at all before but you intend to use it more due to the improvements
    1. You use Regex no more than before and the improvements have not helped you
    1. You didn’t use Regex very much/at all before and this has not changed since the improvements
    1. You use Regex less than before as you find the improvements confusing, over-complex etc.
    1. What is Regex?

I'd be grateful if you could give an answer (1-7) and add a comment if you feel that it would help.

Thanks in advance for your time😉 [end of survey]

and you can see all the answers, if you join the group, at: https://groups.io/g/memoQ/topic/regex_regular_expressions/89942396?p=Created%2C%2C%2C20%2C2%2C20%2C0&jump=1

The responses were positive in that no one chose option 6 but the results were also pretty disappointing in that many people chose 5 or a close alternative to 5

So you have probably already seen where I'm going with all this.

Regex is super useful but highly underused even when help inserting it is offered, let alone when it is not.

Whoever has added regex to Opus has my thanks but the above post might help them in the future to make it a bit more "user-friendly", even though this has, unfortunately, not proved to be a turning point in getting people to use regex

Just for info/help/inspiration and once again, in NO WAY a criticism of the tremendous work done by you and others for all of us. Regards Dave Neve

TommiNieminen commented 2 years ago

Thanks for the info, Dave, it's very useful (and I don't consider it criticism at all). I'll have a look at your survey answers once I get the approval to join the group. I'll just write down a couple of my own thoughts about this subject, not to dispute your points (since they are valid) but just to look at the bigger picture (these are mainly notes for myself, I might use them for a future presentation etc.).

It's probably true that most translators (and computer users in general) are not fluent enough with regular expressions to utilize them even in simple ways. But it's a big community, and a significant proportion of translators are using regular expressions, so there's a solid base of what you might call power translators (like a power user but a translator). With a specialized translation product such as OPUS-CAT, these power translators make up a larger part of the potential user base. When working inside larger translation departments, these power translators tend to drift to semi-technical roles that provide support for other translators, so they tend to have a wider impact within their organizations (and even as freelancers they tend to be active in the translation community).

So in my view, the regex-enabled functionalities and other advanced functionalities are mainly intended for power translators, who can find imaginative practical uses for them, and also to communicate those practical uses to wider audiences. As a developer, my aim is to provide the flexibility for the power translators to deal with the widest possible range of use cases, hence the levels of complexity baked into the edit rule system. That complexity is also motivated by user feedback that I received from users of TermInjector, a plugin that I worked on before OPUS-CAT. TermInjector works like the post-edit rules in OPUS-CAT, it modifies output from translation memories in Trados using regex (similar to the built-in feature in memoQ).

TermInjector was fairly popular, so I had decent amount of support requests, which were fairly equally split between questions about basic regex functionality and advanced replacement scenarios. For instance, the serial application of regular expressions, that I've included in the OPUS-CAT edit rules, is actually a feature request from a user of TermInjector. In any case, the user feedback from TermInjector convinced me that there is a sufficiently large community of power translators to support advanced translation products. I'd also argue that this community is going to be the most resilient in the face of all the technological developments (mainly MT, but not just) that are going to affect the translation field in the near future, so the community should be catered for and, if possible, expanded to maintain the professional status of translators.

It would be great if the wider translation community would also learn to master regexes and other advanced functionalities, but I'm sceptical about it. I've worked in the translation field for 20 years now, and even though the job has been pretty technically demanding right from the start (Trados Workbench was not a user-friendly tool), I've observed that most translators do not acquire advanced technical skills during their careers. A big part of that is lack of training, of course, but in the future there's probably going to be less resources for training: the direction is towards simplifying the tools, not training the translators to master the old, more complex tools. So it will continue to be the case that power translators will continue to be mostly people who are already interested in technical matters and are motivated to teach themselves.

I still think it would be worth it to push e.g. regex training more on translation students and working translators, but it needs to be tied closely to practical scenarios. I have recently dabbled in translator training, so I might be looking into it at some point. Let me close this stream of consciousness by showing how not to teach regular expressions. Some ten years ago I made a small game where the objective is to build finite state machines that are equivalent to regular expressions, you can find it here. The problem with the game is that it presupposes the knowledge it is supposed the teach, and it is completely unconnected to practical matters, so at least it works as a negative example.

SafeTex commented 2 years ago

Read and noted Very interesting. Thanks for taking the time to reply and share your thoughts on this. Regards Dave Neve (SafeTex)