PauseAI / pauseai-website

Website for PauseAI.info
https://pauseai.info
Other
12 stars 39 forks source link

Offer translations #157

Open Wituareard opened 7 months ago

Wituareard commented 7 months ago

Discord discussion: https://discord.com/channels/1100491867675709580/1226136907046588476

joepio commented 4 months ago

This is a very big and important question, as there exists little AI risk knowledge in most non-english languages. We need to get this right.

Wants

Technical approaches

sveltekit-i18n

Seems great for non-markdown content. We'll need something custom for dealing with pages. May need quite a bit of logic to accommodate all the above

Split some folders, manual management

Translate articles on the fly

Use an external CMS with built-in drafting / versioning

Questions

joepio commented 3 months ago

@imotaru

Imotaru commented 3 months ago

From my call with Mia:

joepio commented 3 months ago

@Imotaru Great insights.

joepio commented 2 months ago

Current idea

anthonybailey commented 2 months ago

OK. So as you can see from the above note, Joep's background evolving plan and the one I'd arrived at independently were pretty similar and even complementary.

My evolving ideas are captured in the Discord Project Babel thread and at the time of writing conclude with a "Sonnet 3.5, tell me why I'm stupid about this" interaction linked from here.

Joep showed me some of the Svelte machinery in a 1:1 chat just now and bestowed the welcome revelation that Netlify is already ultimately serving static content from a CDN - all my worrying about how our main PauseAI website scales to extraordinary load during the most important hours of its life is pretty much handled. As expected, we do then have to thread the localization needle appropriately (decisions about whether to capture locale in route URL / fragment / cookie etc.); and similarly, we might want to special-case localization for some really valuable dynamic components (or tweak their design), and the built in search indexing will need some localization tweaks.

But ultimately it all looks essentially flexible and workable, with nice degradation properties when we mess up or things fail (very low effort LLM-powered l10n coverage at reasonable quality suffices for us.) Even though it has a worrying whiff of "invent it here, bespoke glue" developer enthusiasm, it truly might be simplest to just do it ourselves, rather than locate and early adopt some cutting-edge truly future-looking AI start-up's reimagining of localization, or make do with existing l10n frameworks that have incentive to wrongly emphasize highest possible quality and humans in the loop. We can pair at his convenience to share understanding of existing Svelte design choices and mitigate a very clear potential bus problem.

anthonybailey commented 1 month ago

Here are some more specific notes.

The website wasn't designed to be l10nable, so there are scattered text edge-cases. Experiments with Claude suggest I can translate markdown posts directly. And most other parts of the site are post-like and probably convertible to that form.

Trivial and ignorable short messages (errors, special cases, understandable in context):

This is opaque to me but almost text-free:

There is l10n-worthy text in these, which I suspect can become markdown posts:

As above, but we'd want to factor out some non-li0ned innards:

Plausibly become markdown, but they are patterns over static text data:

As above, but said data is drawn from AirTable:

These are simple aggregators that just need the l10n choice pushed through:

These special cases look too difficult to factor trivially. Perhaps leave as en-US only in round one?

joepio commented 1 month ago

I played around a bit with a script, got it to autotically generate translations based on a configuration, add sub folders.