anuraghazra / github-readme-stats

:zap: Dynamically generated stats for your github readmes
https://github-readme-stats.vercel.app
MIT License
66.48k stars 21.55k forks source link

Migrate to automatic README translation #2053

Open rickstaa opened 1 year ago

rickstaa commented 1 year ago

Is your feature request related to a problem? Please describe.

Keeping the documentation up to date and managing the PRs would be more manageable if we switched from manual to automatic README translations (see https://github.com/dephraiim/translate-readme). The downside is that there might be some errors, but this shouldn't matter for understanding how to use GRS. Google has become quite good at translating languages in the last few years. The upside is that we no longer need to look at translation PRs, we can support more language, and the translations are up to date. We can add flags to the readme for people to choose their language.

Pranav2612000 commented 1 year ago

Hey @rickstaa I would like to work on this one. Is it fine?

rickstaa commented 1 year ago

@Pranav2612000 First of all, welcome to the commuExcellent! Amazing that you want to help us improve the maintainability of the repository. I am unsure how hard it is to implement this feature and whether https://github.com/dephraiim/translate-readme serves our needs. The implementation found in translate-readme is quite basic (see https://github.com/dephraiim/translate-readme/blob/main/index.js). It could therefore be that it does not filter the query parameters found in the code blocks.

image

It could therefore be that we need to improve this action or create a new action.

My original idea

🇳🇱 🇫🇷 🇺🇸 🇩🇪 🇮🇹
rickstaa commented 1 year ago

@Pranav2612000 Looks like the https://github.com/dephraiim/translate-readme does not yet support HTML it could therefore be that it does not suit our need (see https://github.com/dephraiim/translate-readme/issues/1). In that case, we might need to improve this action or build our own action. This would likely require us to add some regex to filter out the HTML and code blocks and put them back in. If you want, you can run some tests to see how good the action is and decide if you still want to take on this challenge 👍🏻.

Pranav2612000 commented 1 year ago

Yeah. Took a look at https://github.com/dephraiim/translate-readme and I agree we'll need to modify this a bit. I'll see if I can come up with something so that we don't translate the query params and only translate the non-code text.

rickstaa commented 1 year ago

@Pranav2612000 I did some research, and the paid google translate API does handle HTML code (see https://cloud.google.com/translate/docs/advanced/translating-text-v3). However, it does not filter markdown code blocks and will likely translate code in those blocks. These blocks, therefore, have to be filtered and injected using regex. Further, google will charge $10 per million characters after the 500000 chars per month have been used up. Users have to set up an API key to get it to work.

In contrast https://github.com/dephraiim/translate-readme uses https://github.com/iamtraction/google-translate/blob/master/src/tokenGenerator.js#L73 which simply makes a call to the translate.google.com. The results are therefore more unstable, limited to 5000 characters and require more regex filtering before they can be used.

Therefore, I think this should be possible both with the free and paid versions, but it does require significant development time to filter out markdown code blocks and HTML.

rickstaa commented 1 year ago

Still, feel free to try to tackle this if you think it can be done in the time you had set for implementing this feature. 🤔 I think both versions (paid and free) would require some parsing to ensure that markdown code blocks and HTML code are still valid. I did not search yet, but there might be some packages that can already provide this ability.

andrii-bodnar commented 1 year ago

Hey everyone,

How about using a localization platform? I like Crowdin - a cloud-based solution that streamlines localization management for your team. It's free for open-source. Crowdin allows the community to collaborate on content translation and there is a possibility to set up an automatic translations synchronization using Crowdin's native GitHub integration or GitHub Actions.

Node.js CLI Apps Best Practices - an excellent example of a project using Crowdin for translating content by a community + GH Action for automatic synchronization.

I would be happy to help with the setup.

parinzee commented 1 year ago

Hey there! @rickstaa I think I can tackle this. If you could assign this to me that would be great. Also you would be willing to use paid solutions right?

rickstaa commented 1 year ago

Hey everyone,

How about using a localization platform? I like Crowdin - a cloud-based solution that streamlines localization management for your team. It's free for open-source. Crowdin allows the community to collaborate on content translation, and there is possible to set up an automatic translations synchronization using Crowdin's native GitHub integration or GitHub Actions.

Node.js CLI Apps Best Practices - an excellent example of a project using Crowdin for translating content by a community + GH Action for automatic synchronization.

I would be happy to help with the setup.

@andrii-bodnar Thanks for your message. I appreciate you trying to provide me with a solution. 👍🏻

I checked your profile and see you are a software engineer at Crowdin. I don't care since you offer a valid solution, but some people might fall over that. Maybe next time, add a disclaimer to your comment.

Having that said, I checked your documentation, videos and platform, and I have to say that I'm impressed by the tool you created. I think it is beneficial for streamlining translations for big projects. Thanks for bringing it to my attention. For our small project, however, I think it does not offer too much improvement over the translations.js we are currently using.

The main thing I am trying to solve with #2053 is to eliminate the manual translations of the readme we currently use since these are often incorrect and outdated and clutter our PR backlog. I am therefore looking for an action that uses a service like a google translation API or the free google translation website to do the translation. I found https://github.com/dephraiim/translate-readme, but as explained above, it does not support our readme because of the HTML and markdown code blocks.

rickstaa commented 1 year ago

Hey there! @rickstaa, I think I can tackle this. If you could assign this to me, that would be great. Also, you would be willing to use paid solutions, right?

@parinzee Thanks for offering to help implement this feature. Since https://github.com/anuraghazra/github-readme-stats is a free, open-source project, we can, unfortunately, not rely on paid solutions. The reason I mention the google translation API API is that it offers 500000 free translation characters per month, which should be enough to translate the readme (which has 21707 chars) into 23 languages every month.

rickstaa commented 1 year ago

@parinzee, @Pranav2612000 I just removed the hacktoberfest label since this issue is not self-contained (it requires the building of a new translation action) and does, therefore, not adhere to the Hacktoberfest maintainer guidelines. That does not mean that I do not accept pull requests for this issue as Hacktoberfest submissions, but simply that this issue is quite involved, and I want to prevent people from seeing it when they search for Hacktoberfest issues. If you still want to tackle this issue feel free to let me know, and I will assign you.

rickstaa commented 1 year ago

Hey everyone,

How about using a localization platform? I like Crowdin - a cloud-based solution that streamlines localization management for your team. It's free for open-source. Crowdin allows the community to collaborate on content translation, and there is a possibility to set up an automatic translations synchronization using Crowdin's native GitHub integration or GitHub Actions.

Node.js CLI Apps Best Practices - an excellent example of a project using Crowdin for translating content by a community + GH Action for automatic synchronization.

I would be happy to help with the setup.

@andrii-bodnar I had some time to look at Crowdin and implemented it on one of my other OS repositories. For the card translations, I think it is a significant improvement over the manual translation PR. If @anuraghazra is okay with it, we can use Crowdin for the card translations (i.e. https://github.com/anuraghazra/github-readme-stats/blob/master/src/translations.js). Maybe we can also add the README translations later, as I'm still thinking about creating an automated solution using the Google translate API.

If you could set it up, that would be great 🎉. We can then add a note to both the README.md and CONTRIBUTING.md to explain how users can add card translations. My Crowdin account is rickstaa, or do you need @anuraghazra's account to set it up?

TODOs

rickstaa commented 1 year ago

@anuraghazra, What are your thoughts about using Crowdin for our card translations? I think it improves the translation procedure or do you think it is a bit overkill for only the https://github.com/anuraghazra/github-readme-stats/blob/master/src/translations.js file 🤔?

andrii-bodnar commented 1 year ago

Hi @rickstaa, happy to hear about your success with Crowdin implementation in the GitHub Emoji Picker project!

Just checked the translations.js and it seems like it requires some refactoring to be ready for automatic localization.

The main issue here is that all the languages are located inside a single file. It would be great to split these languages into separate files and ideally store them in JSON files.

From my perspective, Crowdin could be used here for translating both card texts and Readme. Readme files could be translated through the automatic workflows via MT engines. That will also give the possibility for translators to suggest better translations since MT engines might provide bad results.

rickstaa commented 1 year ago

Hi @rickstaa, happy to hear about your success with Crowdin implementation in the GitHub Emoji Picker project!

I Just checked the translations. Js, and it seems like it requires some refactoring to be ready for automatic localization.

The main issue is that all the languages are located in a single file. It would be great to split these languages into separate files and, ideally, store them in JSON files.

From my perspective, Crowdin could be used here for translating bots card texts and Readme. Readme files could be solved through automatic workflows via MT engines. That will also allow translators to suggest better translations since MT engines might provide wrong results.

I'm okay with splitting the files into multiple files as I did for the GitHub Emoji Picker. We can try it out for both the card translations and READMEs. :fire: I, however, will leave the ultimate decision to @anuraghazra, so let's wait for his thoughts on the change.

anuraghazra commented 1 year ago

Crowdin seems good! Yeah i think storing locale files as JSON will be the standard way to go.

rickstaa commented 1 year ago

@andrii-bodnar, does Crowdin also offer a way to automatically translate the README into other languages using third-party translators like the Google Translate API while keeping code blocks and HTML from being translated? 🤔

andrii-bodnar commented 1 year ago

@rickstaa sure, the best option here - is an automated workflow in Crowdin Enterprise. There is an MT Pre-translation step that can be configured to use some MT engine. New strings will be translated automatically in this case. In addition, it's possible to manually translate or correct strings. Crowdin Workflows are very flexible.

A similar flow is possible in crowdin.com also - Custom Workflows. It's simpler than Crowdin Enterprise Workflows but it also has an automatic MT Pre-Translation feature.

rickstaa commented 1 year ago

@rickstaa sure, the best option here - is an automated workflow in Crowdin Enterprise. There is an MT Pre-translation step that can be configured to use some MT engine. New strings will be translated automatically in this case. In addition, it's possible to manually translate or correct strings. Crowdin Workflows are very flexible.

A similar flow is possible in crowdin.com also - Custom Workflows. It's simpler than Crowdin Enterprise Workflows but it also has an automatic MT Pre-Translation feature.

@andrii-bodnar amazing to hear that Crowdin enterprise provides this possibility. Maybe we can arrange a partnership between your company and GRS if you and @anuraghazra are open to that.

Such a partnership can benefit both parties since it will give more exposure to your service and makes the GRS repository easier to maintain. 🚀 I don't think the load on your systems would be extreme since we update the README.md or card translations maybe once every two months. 🤔

andrii-bodnar commented 1 year ago

@rickstaa Crowdin is free for Open-Source projects 🙂

It's very easy to submit to the Open-Source plan. First, the project owner needs to create a Crowdin or Crowdin enterprise account.

And then, submit an Open-source project setup request form.

Of course, we would be extremely happy if you add some badge to your project Readme 🙂 (but it's up to you)

andrii-bodnar commented 1 year ago

@rickstaa the only thing I'm worried about - is the upload of the existing translation to Crowdin.

The point here is that Crowdin uses ML technology to upload translations of HTML-based files. Sometimes it still requires some manual work to do. For more details see this article. As I can see, the Readme is already translated into a bunch of languages.

By the way, how it's going with the JS translation extraction into separate JSON files?

rickstaa commented 1 year ago

@rickstaa the only thing I'm worried about - is the upload of the existing translation to Crowdin.

The point here is that Crowdin uses ML technology to upload translations of HTML-based files. Sometimes it still requires some manual work to do. For more details see this article. As I can see, the Readme is already translated into a bunch of languages.

By the way, how it's going with the JS translation extraction into separate JSON files?

@andrii-bodnar, unfortunately, I haven't had the time to perform the JS translation extraction.

I just discussed this with @anuraghazra, and if you are willing to implement the automatic README translations for us, that would be amazing! We are more than willing to put a Crowdin badge somewhere on the readme. 👍🏻 As explained above, this might be a very beneficial (symbiotic) partnership. 🚀

If you think these automatic README translations are currently impossible or you don't have resources available to implement then, no problem. 👍🏻 I think in that case, we will likely remove the translated READMEs since maintaining them manually is not double anymore, given the scale of this project. 😅

rickstaa commented 1 year ago

@andrii-bodnar Feel free to enter my discord server, which can be found on my GitHub README if you want an easier way to discuss 👍.

andrii-bodnar commented 1 year ago

@rickstaa @anuraghazra I'll try to prepare a demo Crowdin project and GH Actions Workflow for you 🙂

rickstaa commented 1 year ago

@rickstaa @anuraghazra I'll try to prepare a demo Crowdin project and GH Actions Workflow for you 🙂

Amazing, thanks! I'm looking forward to seeing your solution. 🚀

andrii-bodnar commented 1 year ago

Hi @rickstaa @anuraghazra,

Just prepared a Demo Crowdin project and created a PR with integration - #2489

Please check it out 🙂

rickstaa commented 8 months ago

Related to #3364.