Open Wituareard opened 7 months ago
This is a very big and important question, as there exists little AI risk knowledge in most non-english languages. We need to get this right.
Seems great for non-markdown content. We'll need something custom for dealing with pages. May need quite a bit of logic to accommodate all the above
/en
and /nl
etc folders. show original
buttona.svelte
component@imotaru
From my call with Mia:
@Imotaru Great insights.
v1.2ai
to indicate if this has been AI generated, whereas v1.2m
is a manually checked one. I think we could put this in the md metadataOK. So as you can see from the above note, Joep's background evolving plan and the one I'd arrived at independently were pretty similar and even complementary.
My evolving ideas are captured in the Discord Project Babel thread and at the time of writing conclude with a "Sonnet 3.5, tell me why I'm stupid about this" interaction linked from here.
Joep showed me some of the Svelte machinery in a 1:1 chat just now and bestowed the welcome revelation that Netlify is already ultimately serving static content from a CDN - all my worrying about how our main PauseAI website scales to extraordinary load during the most important hours of its life is pretty much handled. As expected, we do then have to thread the localization needle appropriately (decisions about whether to capture locale in route URL / fragment / cookie etc.); and similarly, we might want to special-case localization for some really valuable dynamic components (or tweak their design), and the built in search indexing will need some localization tweaks.
But ultimately it all looks essentially flexible and workable, with nice degradation properties when we mess up or things fail (very low effort LLM-powered l10n coverage at reasonable quality suffices for us.) Even though it has a worrying whiff of "invent it here, bespoke glue" developer enthusiasm, it truly might be simplest to just do it ourselves, rather than locate and early adopt some cutting-edge truly future-looking AI start-up's reimagining of localization, or make do with existing l10n frameworks that have incentive to wrongly emphasize highest possible quality and humans in the loop. We can pair at his convenience to share understanding of existing Svelte design choices and mitigate a very clear potential bus problem.
Here are some more specific notes.
The website wasn't designed to be l10nable, so there are scattered text edge-cases. Experiments with Claude suggest I can translate markdown posts directly. And most other parts of the site are post-like and probably convertible to that form.
Trivial and ignorable short messages (errors, special cases, understandable in context):
This is opaque to me but almost text-free:
There is l10n-worthy text in these, which I suspect can become markdown posts:
As above, but we'd want to factor out some non-li0ned innards:
Plausibly become markdown, but they are patterns over static text data:
As above, but said data is drawn from AirTable:
These are simple aggregators that just need the l10n choice pushed through:
These special cases look too difficult to factor trivially. Perhaps leave as en-US only in round one?
I played around a bit with a script, got it to autotically generate translations based on a configuration, add sub folders.
Discord discussion: https://discord.com/channels/1100491867675709580/1226136907046588476