Open rhofer opened 2 years ago
This issue has been put aside. It is currently unclear if it will ever be implemented as it seems to cover too narrow of a use case or doesn't seem to fit into Weblate.
Please try to clarify the use case or consider proposing something more generic to make it useful to more users.
/cc @KadAnna
@nijel I can completely relate to what @rhofer is describing here. We really love Weblate, but the TM management and data "polution" is just very high and there is no feasible way to maintain the TMs. It would be important to have those in order for Weblate to be used in a more reliable way. We are also struggeling with this.
Issue #6050 also indirectly requests improvements to TM handling
Option 4: provide option to switch off automatic TM enrichement
Is halfway done. @rhofer would like to have the option to still have the manual upload TMX as a source of automatic suggestions, even when the TM is off for the project.
It would be nice to not only completely turn off Weblate translation memory, but also make configurable what TMs of other projects I want to use. Also, an option to just turn off the enrichment by the project and still keep the availability of the TM as a source would be great.
Let’s talk about how it should work.
General remark:
In professional CAT, there are usually various options to manage the population of TMs.
In 2018 there was a nice "open-sourced" initiative of some companies sitting down to draw up some TM Management best practices. See https://github.com/GILT-Forum/TM-Mgmt-Best-Practices/blob/master/best-practices.md Maybe a source for ideas on how to enhance TM management in weblate.
Keeping TM nice and proper is key to good translation output (and also in order to train MT!)
@ilocit many thanks for your input and the link. Wasn't aware of this. the .../best-practices.md
is definitely worth reading.
Regarding your points 1) to 3), to keep such things in mind as vision, where to professionally arrive at the end in the area of TM mamangement is a great thing. Nevertheless, every small step providing more capabilities in TM management is heavily appreciated. ... I'd perceive a whole-in-one shot to go for the vision directly, isn't feasible.
Yes, small steps are much appreciated. But keeping a vision in mind, as direction into which the journey is heading. I think it might be worthwhile thinking about the vision first and get things straight. Maybe we, Neil, others don't want our / my vision to be their vision. ;-)
As you had suggested already during our last 1:1, @rhofer , maybe a Weblate UG meet-up would be a nice idea! :-)
I completely concur with @rhofer 's description of the problem. You just cannot work with "left-overs" that would pollute your knowledge base. I just encountered the problem on our self-hosted Weblate server: it would keep on suggesting entries that belonged to an obsolete Component that had been removed - even though I had specifically deleted the related TM entry in the TM Manager (/memory/ page) - which seems like a bug IMO.
Here is how I was able to resolve the problem: Using the GET /api/memory/ entry point, I downloaded the list of all TM entries and collected the id of the obsolete ones, using a filtering criteria. Then I used the DELETE /api/memory/(int:memory_object_id)/ end-point to remove them one by one.
This was battlefield medicine, but it worked great. I hope this approach helps someone until we have a working Delete button in the TM Manager.
The per-component or project delete and re-create is there (see https://github.com/WeblateOrg/weblate/issues/7347). The individual entries can also be deleted (see https://github.com/WeblateOrg/weblate/issues/6440). If anything is broken on these, please open a separate issue so that we can take a look.
Situation
In our weblate self-hosted approach, any translation components are premanently onboarded. In order to benefit cross component / project suggestions e.g. with "automatic suggestions", both default machineries are activated (
Weblate
: live component look-up,Weblate Translation Memory
: TM look-up).Working on tanslation, one essential aspect is to harmonize terminology in use across various components or even across various projects. For example, we may start with a first translation then later on it is needed to revise it for the sake of harmonized terminology. This results in having old, obsoleted strings enriched in TM, as well as the latest, harmonized and approved string.
Over the time, this leads to a polluted TM, where the machinery
Weblate Translation Memory
provides outdated, obsolete or (meanwhile) even forbidden strings. In e.g. "automatic suggestion" tab, potential suggestions meanwhile became a mix of old TM results, latest TM results and live results from active components.Meanwhile, this is heavily puzzling translators and even leads to mistakes in a way, that translators pick outdated terms or even forbidden ones from TM.
Goal
As a translator, I don't want to see a history of text memory. More specifically, I only want to see auto suggestions based on texts which are currently valid and approved translations.
Problem
Today, weblate provides no means to manually or automatically clean up TMs in order to get rid of "old" stuff and hence avoiding translation mistakes if translators base on TM results. Therefore, with translations continuously happening and todays automatic enrichement of TM, the pollution of TM continuously grows.
This issue is collecting options in order to improve TM management. In order to make a specific option implementable, this is/will be carved out to a specific, individual issue.
Option 1: improve global TM management - delete and recreate
This option is described with https://github.com/WeblateOrg/weblate/issues/7347
This would be very helpful in order to counteract TM pollution in a manual way, but with a kind of "mass operation", where clean-up does not happen on individual string base but on full TM scopes.
Option 2: individual deletion of TM entries
This option is described with https://github.com/WeblateOrg/weblate/issues/6440
In selective situations this is helpful. Currently, for my situation, Option 1 would be sufficient.
Option 3: automatic TM maintenance based on review state
This option is not yet put to a individual issue, since this requires discussion first.
enable
/disable
this automatic clean-up.disable
for backwards compatibilityExcluded:
Affects:
Pros
Cons
Option 4: provide option to switch off automatic TM enrichement
This optoin is described with https://github.com/WeblateOrg/weblate/issues/7348
In our use case, we primarily build on the
Weblate
machinery providing a live look-up to all connected components. Once this option is available, we would switch off automatic TM enrichement.Remark to any options
All this only affects the automatic enrichement of TMs. What still shall be possible (as-is today):
Preferred solution approach
For our use case we are requiring the following options as a best fit for a next improvement step: