OoriData / OgbujiPT

Client-side toolkit for using large language models, including where self-hosted
Apache License 2.0
101 stars 8 forks source link

Enhancements to Word Loom—item metadata #77

Closed uogbuji closed 4 months ago

uogbuji commented 4 months ago

Having spent some time working with Word Loom lately, I have a few enhancements in mind. @chimezie, would love your thoughts.

First of all, I'd like to to formalize the name "language item" (just "item" for short) for each entity that's specified with a TOML table.

I'd like to add support for language item metadata. In effect, any other key/value pairs within the TOML table for an item would be passed in as metadata associated with that item to the Word Loom implementation. There is actually an example of this already in the Word Loom "spec" ([i18n_context] / source), but I completely omitted any discussion of that. I would just add the formalization that it would be preserved as item metadata. Such metadata would support e.g. control features for how language items are selected & processed for LLM prompts.

Then probably the most controversial bit. In order to lower the footprint of English language in Word Loom, and also free up a couple of other metadata keys, I would deprecate the text key in favor of just _ (inspired by gettext) and deprecate the markers key in favor of _m.

uogbuji commented 4 months ago

Oh, I would also say that all keys starting with _ are reserved for (possibly future) Word Loom-specced use.

uogbuji commented 4 months ago

Furthermore, alternate languages would be specified using _ followed by an IETF language code, e.g. _fr.

uogbuji commented 4 months ago

Above commit implements changes discussed above (on a branch until agreed & ready 😊)

uogbuji commented 4 months ago

OK a proposed update of the Word Loom spec is here. For diff comparison with previous versions, click here.

chimezie commented 4 months ago

TOML table metadata would be a welcome addition for declarative prompt composition using Word Loom. Deprecating the text and markers keys in that way makes sense, given the proposal for indicating metadata and seems justified as a breaking change.

uogbuji commented 4 months ago

OK. The implementation on that branch pretty completely implements this spec, I think, so feel free to try it out & LMK if you have any problems. I'll probably look to land the PR tomorrow.