ckan / ckanext-harvest

Remote harvesting extension for CKAN
130 stars 203 forks source link

Add dataset translation through ckan CLI #515

Closed wanam closed 1 year ago

wanam commented 1 year ago

This feature uses Google translate free plan through deep-translator library: https://github.com/nidhaloff/deep-translator

Translation can be triggered during harvesting with "-t/--translate language" argument, where language code is ISO 639-1 which is the alpha-2 code.

Example: ckan --config=/etc/ckan/default/ckan.ini harvester gather-consumer -t fr

amercader commented 1 year ago

@wanam Thanks for taking the time to submit this PR. I think automated translation of datasets metadata is an interesting idea to explore but I don't think this is the right approach. Instead, I would focus on:

  1. Making this a separate extension. Extensions are better when they are focused in their scope, so rather than integrate this with ckanext-harvest (or ckanext-dcat) I would create a new ckanext-autotranslation extension that adds this feature. You can hook to the IPackageController.after_dataset_create() and `IPackageController.after_dataset_update() to run the translation process (probably in a background job)
  2. I wouldn't discard the original metadata value, but rather keep the translated value separately. Look into ckanext-fluent for the expected format
  3. I'd probably be more explicit about the fields I want translated rather than trying to translate everything, maybe flagging them in the schema or defining them in a config option. Most fields are machine readable and don't really make sense to translate

Hope this makes sense, good luck!