cncf / tag-env-sustainability

πŸŒ³πŸŒβ™»οΈ TAG Environmental Sustainability
https://tag-env-sustainability.cncf.io/
Apache License 2.0
220 stars 96 forks source link

[Action] Automation to detect required website translations #325

Open leonardpahlke opened 5 months ago

leonardpahlke commented 5 months ago

The TAG ENV website now supports multiple languages. With this new feature, it's essential to ensure that not only the main English version is updated but also the translated versions. Currently, translators must monitor the website; if they notice any changes, they open a PR. This process is error-prone and stressful for those maintaining the translated content. Ideally, when a PR updates the English version of the website (any part of the website), a follow-up issue would be automatically created for each language, which our team can then address.

SamYuan1990 commented 5 months ago

https://github.com/dorny/paths-filter I hope by this github action we can have a change list, as a part of monitoring of the change.

is there any one knows how to create github issue automatically with a change list?

kumarankit999 commented 5 months ago

I would also like to help here! @leonardpahlke

sergiopsyalo commented 5 months ago

I would also like to help too @leonardpahlke

leonardpahlke commented 5 months ago

Assigned you both! Thanks!

guidemetothemoon commented 5 months ago

+1 for getting some automation in place for this!

For the QA checks workflow we're using a changed-files GH Action that works really nicely on identifying changes in the specified files/directories so this or simple action can be relevant to use here as well. Implementation is in this workflow: https://github.com/cncf/tag-env-sustainability/blob/main/.github/workflows/checks.yml

There are multiple GH Actions to choose from that can be used to automatically create an issue based on some event. For example, this one: https://github.com/marketplace/actions/create-an-issue

Maybe we can even create groups with people that have been contributing to the different language translations so that those groups can be added on respective issues (f.ex. spanish translators group to an issue related to translating spanish content).

I'll be happy to contribute both on the implementation and review. Don't hesitate to reach out @kumarankit999 @sergiopsyalo 😊

leonardpahlke commented 5 months ago

@kumarankit999 could you join the next TAG ENV meeting (Feb 14) to briefly report on the status and perhaps we can discuss any questions you may have. https://calendar.google.com/calendar/embed?src=72e93a411f02e5664bb4485c04311b83dae6a62574e4ab882a1ccf8526aa9bf1%40group.calendar.google.com

sergioarmgpl commented 5 months ago

Sorry I commented with the wrong user, could you assign the user @sergioarmgpl instead @leonardpahlke

leonardpahlke commented 5 months ago

@kumarankit999 & @sergioarmgpl pls let us know if you need any assistance here! thanks :)

sergioarmgpl commented 4 months ago

I will use this actions as a reference https://github.com/marketplace/actions/create-an-issue I will need this part ${{ secrets.GITHUB_TOKEN }} I mean that secret to have super powers to create the issue cc: @leonardpahlke

leonardpahlke commented 4 months ago

I will use this actions as a reference https://github.com/marketplace/actions/create-an-issue

πŸ‘

AFAIK the GITHUB_TOKEN is a default environment variable which gets automatically set in GitHub Actions. Should exist by default.

sergioarmgpl commented 4 months ago

@Dianmz will be with me for some shadowing, just to introduce her to this kind of Issues :), Sure, I will use the GITHUB_TOKEN in that way :) @leonardpahlke

sergioarmgpl commented 4 months ago

Retaking this after completing the translation into Spanish

SamYuan1990 commented 3 months ago

how is this issue on going?

leonardpahlke commented 3 months ago

how is this issue on going?

@sergioarmgpl ^ β€” do you need any support? Do we have a plan how to implement it?

sergioarmgpl commented 3 months ago

I will work on this, this week, sorry I was a little bit absent.

thelooter commented 3 months ago

Another way this could be approached would be using a tool like crowdin or Weblate to manage translations. You would essentially put in the translatable strings and then add languages it should be translated too. This would also make tracking the process of translations easier. It would also lower the bar to entry for translation, since you could much more easily translate small chunks instead of needing to translate a whole document at once

leonardpahlke commented 3 months ago

Thanks @thelooter!

@cjyabraham would be interesting to hear your thoughts on this.

leonardpahlke commented 3 months ago

cc @nate-double-u

cjyabraham commented 3 months ago

I think the place for human-crafted website translations is shifting as automated in-browser AI translations get better and better. If we're not already there, I expect we will soon be at the point where automated translations are about as good as those done by a human. Given that, I would look at supporting human translations as more of a short-term thing rather than something we'll need to do indefinitely. I'm interested, of course, in other people's views on this :)

We've also been exploring this topic wrt the Events site and, as of now, have opted not to build out a complex translation infrastructure.

Another temporary solution for this current issue is to create an issue every time an edit needs to be done to the site with checkboxes for each of the languages. Once the English language edit is done, the other language teams can be assigned to the issue until they have done their translation and checked their box. I'm not sure if this is practical given the cadence of English-language edits but it would save having to have GH actions running to automate things. I'm curious to know how the Glossary team manages this...

I don't like the idea of integrating with a 3rd-party service to have them manage our translations. Integrating sites like that is rarely simple and ties us to that closed-source solution.

sergioarmgpl commented 3 months ago

At least by my side just to detect different make the list of the files to update them send the notification, thats my plan, I have time to work this week, I have some vacations :) news soon.

sergioarmgpl commented 3 months ago

I saw into the glossary there is no automation for translation. So its made by humans. Also I think that human intervention promote people to contribute in some way.

leonardpahlke commented 3 months ago

Also I think that human intervention promote people to contribute in some way.

Yes, translations are a low-barrier type of contribution aimed at parts of the community that are not yet represented in the TAG. However, in general contributions should not be obsolete, otherwise they lose their value.

thelooter commented 3 months ago

I think the place for human-crafted website translations is shifting as automated in-browser AI translations get better and better. If we're not already there, I expect we will soon be at the point where automated translations are about as good as those done by a human. Given that, I would look at supporting human translations as more of a short-term thing rather than something we'll need to do indefinitely. I'm interested, of course, in other people's views on this :)

We've also been exploring this topic wrt the Events site and, as of now, have opted not to build out a complex translation infrastructure.

Another temporary solution for this current issue is to create an issue every time an edit needs to be done to the site with checkboxes for each of the languages. Once the English language edit is done, the other language teams can be assigned to the issue until they have done their translation and checked their box. I'm not sure if this is practical given the cadence of English-language edits but it would save having to have GH actions running to automate things. I'm curious to know how the Glossary team manages this...

I don't like the idea of integrating with a 3rd-party service to have them manage our translations. Integrating sites like that is rarely simple and ties us to that closed-source solution.

While I agree that AI gets better and better every day, its imo just not at the point yet, where it can be reliably used for translations. Many languages rely on certain phrases that can't just be translated one to one, and reasoning those phrases is still a big challenge for AI.

AI can maybe be used to create an initial translation that's refined by humans but I wouldn't just blindly let AI translate it.

I agree that binding ourselves to a closed source tool is not an ideal solution, that's why I suggested weblate. It's open source (https://github.com/WeblateOrg/weblate)

This also makes updating and tracking translations a lot easier since it's managed in smaller chunks and therefore easier to spot changes. As I stated before, it also allows for easier contribution, as one doesn't have to edit/translate a whole document but they can edit small snippets, e.g. while on the train.

nate-double-u commented 3 months ago

AI can maybe be used to create an initial translation that's refined by humans but I wouldn't just blindly let AI translate it.

This has been what I've heard from localization teams too. AI can be used for a first pass, but it doesn't quite reduce the human workload yet because of the amount of editing that is still required.

I'm curious to know how the Glossary team manages this...

@jihoon-seo & @seokho-son may have some insight on how the glossary localization teams manage this.

jihoon-seo commented 3 months ago

Hi all! In the Glossary project, to detect upstream (I mean, en) changes and report using GitHub issue, two GitHub workflows are enabled:

  1. https://github.com/cncf/glossary/blob/main/.github/workflows/check-outdated-content.yaml

    This workflow will check if a localized content is outdated or not by comparing English content in the old branch and the latest branch, and then upload the comparison results to certain path of GitHub workflow instance.

  2. https://github.com/cncf/glossary/blob/main/.github/workflows/post-outdated-content-report.yaml

    This workflow will post a report of outdated content as an GitHub issue, by using data from previous workflows.

And there has been a try to add a Python script that create a translation draft using Google Translator, in k8s/website repo. Please see these issues and PR:

seokho-son commented 3 months ago

Hello everyone, I'm Seokho Son, one of the maintainers of the Cloud Native Glossary (and also Kubernetes SIG-Docs Localization Subproject lead).

First of all, I hope to share the status of the Glossary Project related with this discussion subject:

As for adopting an AI-based translation platform/service, here are my thoughts on the pros and cons:

(Pros)

(Cons)

For the Glossary project, since the number of documents for localization isn't too many and it's manageable by humans, and since we provide methods for updating documents (workflow tools, etc.), there doesn't seem to be an urgent need to adopt an AI-based translation platform/service.

In conclusion, my opinion is, As each project has its own goals, content structure, and contributor status, the approach should be tailored to each project's circumstances. Like the Glossary, I don't think the TAG ENV website has the volume of content that necessitates the adoption of an AI-based translation platform/service.

Returning to the original purpose of this discussion initiated by @leonardpahlke , it would be good to first discuss Automation to detect required website translations:

I hope my opinion and information are useful :)

sergioarmgpl commented 3 months ago

I will take a look, I had a similar idea.

sergioarmgpl commented 3 months ago

This one will work for us https://github.com/cncf/glossary/blob/main/.github/workflows/check-outdated-content.yaml @jihoon-seo thanks

sergioarmgpl commented 3 months ago

I have a call with @Dianmz today to work on this cc: @leonardpahlke

leonardpahlke commented 3 months ago

Awesome! let me know how it goes & if you need support πŸ™Œ

Dianmz commented 3 months ago

@sergioarmgpl and I are currently working on the workflow using bash basing on the above example. We found two issues: The first one we are fixing spaces in the names of the files and the second one, new files but we are currently working on it.

cc: @leonardpahlke @guidemetothemoon

sergioarmgpl commented 3 months ago
Screenshot 2024-04-04 at 22 19 18

This our advance, we already detected the differences between different the languages. We found:

The other thing is that I will create an Issue using a cron workflow in actions, notifying the active people working on Spanish and Chinese. Thats our idea.

Questions:

@leonardpahlke @guidemetothemoon cc: @Dianmz

Dianmz commented 3 months ago

To follow up the previous comment, how would you like or preferred to visualize the changed files?

Personally I prefer to create an issue but we want to know your opinion.

cc: @sergioarmgpl @leonardpahlke @guidemetothemoon

SamYuan1990 commented 3 months ago
Screenshot 2024-04-04 at 22 19 18

This our advance, we already detected the differences between different the languages. We found:

  • Some inconsistencies in a file called landscape/SustainabilityUseCasesAndLandscape2023.ko.md, We would like to remove the ko, just to standardize the files a little bit.

The other thing is that I will create an Issue using a cron workflow in actions, notifying the active people working on Spanish and Chinese. Thats our idea.

Questions:

  • What do you think about?
  • What do you think about calling a script in the workflow or executing bash in the workflow?

@leonardpahlke @guidemetothemoon cc: @Dianmz

for Chinese, please notify me :-)

leonardpahlke commented 3 months ago

Some inconsistencies in a file called landscape/SustainabilityUseCasesAndLandscape2023.ko.md, We would like to remove the ko, just to standardize the files a little bit.

We are currently in the process of rewriting the landscape document. This will take another month or two. We can remove the KO file and follow standards! (@seokho-son @sysnet4admin β€” we started to translate the TAG env website in different languages, like Spanish see PR, we could do the same for Korean)

The other thing is that I will create an Issue using a cron workflow in actions, notifying the active people working on Spanish and Chinese. Thats our idea.

Sounds good! We could create a GitHub team tag-env-translators-spanish, tag-env-translators-korean, tag-env-translators-german, tag-env-translators-chinese, … and tag them in the issue. This can be defined in the CNCF repository β€œpeople” that has a manifest file https://github.com/cncf/people/blob/c7f8625ebd386574959bcb807de1b48ded6f4da2/config.yaml#L1160 (example team: tag-env-chairs)

leonardpahlke commented 3 months ago

To follow up the previous comment, how would you like or preferred to visualize the changed files?

Personally I prefer to create an issue but we want to know your opinion.

I think issues are the best way for sure. One issue per language based on PR. So if we open a PR which makes changes to multiple files we just mirror that with one issue per language (not per file).

guidemetothemoon commented 3 months ago

Yes, I also vote for issues. Suggestion regarding GitHub teams for this also makes a lot of sense.

sergioarmgpl commented 3 months ago

Thank you, retaken this issue this week, yesterday was my birthday and I left my job πŸ™ƒ cc: @leonardpahlke @guidemetothemoon

sysnet4admin commented 3 months ago

@leonardpahlke

We are currently in the process of rewriting the landscape document. This will take another month or two. We can remove the KO file and follow standards! (@seokho-son @sysnet4admin β€” we started to translate the TAG env website in different languages, like Spanish see https://github.com/cncf/tag-env-sustainability/pull/340, we could do the same for Korean)

I will take a look after finishing my duty work 😭

sergioarmgpl commented 3 months ago

Thank you, we have deadline with @Dianmz to finish this the next week. I will create the issues as suggested. cc: @leonardpahlke @guidemetothemoon

guidemetothemoon commented 2 months ago

Thanks for keeping us updated @sergioarmgpl and happy belated birthday!πŸ₯³ Don't hesitate to reach out if you need any support.

Dianmz commented 2 months ago

image

Take a look at our progress πŸ‘€

cc: @sergioarmgpl @leonardpahlke @guidemetothemoon

Dianmz commented 2 months ago

We are going to keep on working on a general workflow for any language to avoid reusing code

cc: @leonardpahlke @guidemetothemoon

Dianmz commented 2 months ago

We found some trouble with the workflow but we are basing on the Kubernete's glossary

cc: @leonardpahlke @guidemetothemoon @sergioarmgpl

Dianmz commented 2 months ago

We are currently testing the final configuration of the workflow!

very close to reach the goal 🎯

cc: @leonardpahlke @guidemetothemoon @sergioarmgpl

guidemetothemoon commented 2 months ago

Very exciting! Thanks for all your time and effort, and for sharing updates with the TAG @Dianmz @sergioarmgpl πŸ’š

sysnet4admin commented 2 months ago

Some inconsistencies in a file called landscape/SustainabilityUseCasesAndLandscape2023.ko.md, We would like to remove the ko, just to standardize the files a little bit.

We are currently in the process of rewriting the landscape document. This will take another month or two. We can remove the KO file and follow standards! (@seokho-son @sysnet4admin β€” we started to translate the TAG env website in different languages, like Spanish see PR, we could do the same for Korean)

The other thing is that I will create an Issue using a cron workflow in actions, notifying the active people working on Spanish and Chinese. Thats our idea.

Sounds good! We could create a GitHub team tag-env-translators-spanish, tag-env-translators-korean, tag-env-translators-german, tag-env-translators-chinese, … and tag them in the issue. This can be defined in the CNCF repository β€œpeople” that has a manifest file https://github.com/cncf/people/blob/c7f8625ebd386574959bcb807de1b48ded6f4da2/config.yaml#L1160 (example team: tag-env-chairs)

Hi @leonardpahlke I commit and push the content of ko. There is partly translating contents. so maybe it is not poor quality ;) So failed to quality check. It may consider to check before merging it.

Please let me know if I could contribute more for it. Have a nice weekend!

sergioarmgpl commented 2 months ago

This is the PR for this issue: https://github.com/cncf/tag-env-sustainability/pull/407

sergioarmgpl commented 2 months ago

Hi Dear TAG Env Team, this is the final the work that we are doing with @Dianmz.

This includes a workflow called Check outdated content, that generates new Issues for Languages for ES & ZH languages at the moment. Currently the issues are assigned to Diana and I, but should be modified to the new groups that you suggested or new ones.

The issues will look like this:

Screenshot 2024-04-30 at 22 05 41

And the issue like this:

Screenshot 2024-04-30 at 22 05 53

The workflow is in the .github/workflows/check-outdated-content.yml

Dianmz commented 2 months ago

@sergioarmgpl and I have just finished the first version of this issue! We will create a second issue to optimize the languages with a reusable workflow.

cc: @leonardpahlke @guidemetothemoon