IFRCGo / go-api

MIT License
13 stars 6 forks source link

Clean Emergency page translations #2048

Open tovari opened 5 months ago

tovari commented 5 months ago

We would like to clean the emergencies languages and provide dynamic translations for the historic data. The attached xlsx lists those emergency IDs that needs some batch action. Actions to be done per worksheet: Emerg (UI- En, summ. - Fr) - copy Summary [en],Title [en] fields content to summary [fr] and Title [fr], then change the Entity original language to French. Initiate translation. - 40 items

Emerg (UI- En, summ.- Es) - copy Summary [en], Title [en] fields content to summary [es] and Title [es], then change the Entity original language to Spanish. Initiate translation. - 104 items

Emerg (UI- Es, summ.- En) - Initiate translation. - 424 items Summary_emergencies-to_be_translated-v3.xlsx

tovari commented 5 months ago

The https://github.com/IFRCGo/go-web-app/issues/627 analysis can be done only before the action of this ticket, so this should be executed, only when #627 is done. cc: @udaynwa

udaynwa commented 2 months ago

Hey @szabozoltan69 . Here is a draft PR to run manual translation. Let me know if any additional information is required. https://github.com/IFRCGo/go-api/pull/2149

szabozoltan69 commented 2 months ago

Some clarification and usage hints would be highly appreciated.

szabozoltan69 commented 1 month ago

Could you give some hints @thenav56 how to continue this task?

thenav56 commented 1 month ago

Hey @szabozoltan69, I hope this helps. Here is the pseudo code we will need for this:

from lang.tasks import ModelTranslator
from main.translation import TRANSLATOR_SKIP_FIELD_NAME, TRANSLATOR_ORIGINAL_LANGUAGE_FIELD_NAME

from api.models import Event

def en_to_fr():
    pks = [...]  # List of primary keys -- Or read the content from an xlsx file instead?
    events = Event.objects.filter(
        pk__in=pks,
        **{TRANSLATOR_SKIP_FIELD_NAME: False},
    )

    print('Translating EN->FR')
    for event in events:
        assert getattr(event, TRANSLATOR_ORIGINAL_LANGUAGE_FIELD_NAME) == 'en'

        # TODO: Should we read the content from an xlsx file instead?
        event.name_fr = event.name_en
        event.summary_fr = event.summary_en
        ...

        # Clear the original English fields - Auto translator will only fill empty fields
        event.name_en = ''
        event.summary_en = ''
        ...

        # Change the original language to French
        setattr(event, TRANSLATOR_ORIGINAL_LANGUAGE_FIELD_NAME, 'fr')

        # Save the new state (Use update_fields= if required)
        event.save()

        # Translate fields dynamically and save the object
        ModelTranslator().translate_model_fields(event)
        print('- Translated successfully:', event.pk)

en_to_fr()
en_to_es()
es_to_en()

This code will translate event fields from English to French and save the changes. Similar functions can be created for other language translations.

szabozoltan69 commented 1 month ago

I check visually the in-xlsx id-s and the newly found ones (without HTML tags) via:

select
substring(regexp_replace(summary_en, E'<[^>]+>', '', 'gi'),0,80) as en,
substring(regexp_replace(summary_fr, E'<[^>]+>', '', 'gi'),0,80) as fr from api_event where (id in 
(3143,3106,3180,3173,3166,3145,3244,5084,5670,6171,6118,6125,6138,
6135,6139,6193,6156,6144,6208,6207,6241,6250,6252,6303,6282,6293,6298,
6310,6356,6375,6380,6385,6423,6421,6440,6462,6504,6746,6734,6842)
or id in (select id from api_event where summary_en like '% dans %' or summary_en like '% et %')) and summary_fr='';

– the French ones | and the Spanish ones –

select
substring(regexp_replace(summary_en, E'<[^>]+>', '', 'gi'),0,80) as en,
substring(regexp_replace(summary_es, E'<[^>]+>', '', 'gi'),0,80) as es from api_event where (id in 
(4161,5193,5438,5936,5982,6088,6213,6288,6330,6485,6644,6172,6146,6649,4049,3099,3091,
3084,3110,3111,3179,3138,3137,3313,5199,5237,5481,5640,5937,6049,6126,6116,6157,6163,
6233,6173,6184,6189,6209,6205,6215,6210,6348,6279,6246,6251,6245,6259,6253,6290,6286,
6267,6280,6287,6289,6292,6301,6311,6318,6333,6323,6321,6342,6344,6343,6378,6483,6352,
6360,6373,6370,6379,6391,6384,6383,6394,6395,6402,6413,6411,6424,6434,6432,6469,6477,
6488,6519,6527,6521,6526,6541,6568,6601,6606,6613,6663,6712,6723,6789,6855,6850,6794,6821,6849)
or id in (select id from api_event where summary_en like '% las %' or summary_en like '% del %')) and summary_es='';

Also those ones which are not translated at all and which are presumably in English (which are on the 3rd tab in xlsx):

select
substring(regexp_replace(summary_en, E'<[^>]+>', '', 'gi'),0,80) as en
 from api_event where id in 
(6851,6807,6768,6729,6671,6647,6627,6589,6559,6538,6537,6536,6532,6531,6529,6523,6518,6513,6512,6508,6507,6506,6505,6503,6502,6497,6495,6491,6489,6484,6482,6481,6475,6474,6473,6472,6471,6468,6466,6463,6460,6456,6455,6448,6444,6443,6442,6441,6438,6436,6435,6433,6431,6430,6427,6426,
6425,6422,6420,6419,6418,6417,6416,6415,6410,6409,6407,6405,6404,6401,6400,6399,6397,6390,6389,6388,6387,6382,6374,6372,6371,6369,6368,6367,6365,6363,6362,6359,6358,6357,6351,6340,6339,6336,6334,6332,6331,6328,6326,6325,6324,6316,6308,6307,6305,6302,6299,6295,6291,6284,6283,6281,
6275,6274,6273,6272,6271,6266,6264,6262,6261,6258,6257,6256,6255,6254,6249,6248,6248,6247,6244,6243,6242,6240,6239,6235,6234,6232,6228,6225,6224,6222,6221,6218,6203,6201,6200,6199,6198,6197,6187,6186,6185,6182,6181,6180,6179,6175,6169,6168,6167,6162,6161,6159,6155,6154,6152,6151,
6149,6147,6141,6137,6133,6130,6121,6120,6119,6117,6108,6075,6042,6042,6042,6037,6013,5983,5919,5912,5911,5906,5901,5816,5812,5810,5807,5789,5744,5740,5707,5686,5686,5686,5686,5630,5594,5567,5567,5567,5567,5539,5536,5535,5473,5469,5466,5432,5428,5403,5402,5400,5393,5392,5379,5379,
5379,5376,5344,5332,5324,5308,5301,5280,5257,5251,5235,5220,5174,5123,5083,5083,5071,5027,5026,4875,4832,4807,4717,4651,4641,4640,4617,4592,4591,4583,4578,4563,4557,4556,4555,4532,4519,4436,4422,4398,4391,4387,4386,4385,4379,4370,4342,4341,4340,4337,4336,4331,4330,4329,4328,4327,
4326,4325,4324,4323,4322,4321,4320,4318,4317,4316,4313,4312,4311,4310,4292,4286,4278,4240,4191,4144,4144,4144,4137,4106,4014,3985,3982,3972,3954,3948,3926,3887,3864,3751,3724,3704,3700,3687,3674,3673,3670,3664,3662,3651,3649,3637,3636,3635,3634,3633,3632,3631,3630,3629,3628,3627,
3626,3625,3624,3623,3622,3621,3620,3619,3618,3552,3536,3528,3528,3528,3517,3495,3485,3469,3439,3432,3401,3399,3399,3399,3388,3296,3284,3282,3280,3278,3277,3252,3250,3226,3218,3208,3177,3176,3174,3172,3170,3169,3168,3167,3162,3161,3160,3159,3155,3154,3153,3152,3151,3149,3148,3147,
3140,3136,3135,3126,3125,3124,3122,3120,3119,3115,3112,3102,3101,3100,3098,3097,3096,3095,3094,3093,3087,3086,3085,3082,3081,3080,3079,3076,3073,3067,3061,4
)
and summary_fr='' and summary_es='';
szabozoltan69 commented 1 month ago

Task done, some unsuccessful ones from scope 3 are below, they are fixed manually via simplifying the rich text HTML code. At 6559 the autotranslation had to be switched off, otherwise the table translations could not be done.

6671
6589
6559
6455
3619
szabozoltan69 commented 1 month ago

I could use the above pseudocode fine, with a small change, allowing to continue the loop in case of error also:

        # Translate fields dynamically and save the object
        try:
            ModelTranslator().translate_model_fields(event)
            print('- Translated successfully:', event.pk)
        except:
            print('Unsuccessful:', event.pk)