forrtproject / forrtproject.github.io

FORRT Website
https://forrt.org
Other
28 stars 16 forks source link

Generate translated glossaries #130

Open LukasWallrich opened 4 days ago

LukasWallrich commented 4 days ago

Currently, the English glossary exists, and translations are waiting to be added. For that, we need a script similar to the summaries.py that parses the GDocs and creates the glossary in /content/glossary/german and other language folder

Links to German docs [currently contain English text as well that needs to be removed during parsing]: https://drive.google.com/drive/u/0/folders/1PrX97lGjRGHvvUJgTZYGtZoC6GHqTU7p

So, the tasks are:

LukasWallrich commented 4 days ago

@flavioazevedo can you check and add your email with the other relevant links?

flavioazevedo commented 4 days ago

Yes, thanks for this @LukasWallrich ! I am super happy you will be able to help us with this! 🎉 Please let me know if I can be useful somehow!

re: language selector, if possible, we want to have buttons that are large on the page that people click to go directly to the desired language. Below is a picture of buttons implemented on our website from the Curated resources page https://forrt.org/resources/

image

So the text I had shared previously is below. I hope it is useful

Our community defined 350 terms in open science in a series of Google docs, where we worked collaboratively. Then, we used a Python script to scrape the information from Google Docs, and we transformed it into a JSON that outputs .md files, which are then read and included as individual entries on our website.  Our website is built with Hugoblox (https://hugoblox.com/). Our issue is that we lost the code we used for this operation. Thankfully, a big part of this work was done for another project. So we still have something really close to what we did for the glossary (code and output) but for the summaries project (https://forrt.org/summaries/), where we used the same procedure as above, we have the python (https://github.com/forrtproject/forrtproject.github.io/blob/master/content/summaries/summaries.py), which should give you a leg up, and the .json file: https://github.com/forrtproject/forrtproject.github.io/blob/master/data/summaries.json Here it is two of the Google docs we want to extract information from: 1. https://docs.google.com/document/d/1Br-tqLh_nOXnjmddBmKCmTFLdDFmbhD5FA6T9v8GatU/edit?usp=drive_link 2. https://docs.google.com/document/d/196z4wBqjQAuNg3I8dwZY-di6S1sF_hwx1Y5AFwikKHI/edit?usp=drive_link 

LukasWallrich commented 4 days ago

I now created a master file that will need to contain the links to published versions of all glossary entries - done for German, but @flavioazevedo you can add Arabic etc here as well when that is done (and it would be good to have more than 1 language in place for final testing, but no rush)

LukasWallrich commented 4 days ago

[I now got this to largely work with the data as is - some would require manual fixes, but not worth the effort imho.]

@flavioazevedo how much do we care about links to related concepts? Currently, they are broken in the English version as well (#132) - if they are important to have, we need to add the original English title to each translated glossary entry (I cannot extract them as some of the titles already include brackets).

Given that users can just scroll to the right entry in the navigation, I don't think this is crucial, but obviously nice to have.