Closed L3G5 closed 2 months ago
Oops, I honestly spent like half an hour on trying to make it work, but turns out that for now just using html.unescape
will work for my purposes.
To be clear,
import html
example_phrase = Phrase(phrase="こんにちは、お元気にお過ごしでしょうか。")
html.unescape(f"""Rephrase the phrase {Phrase.xml_tags()} in ten different ways. Each way should be between {Rephrase.xml_tags()}. {example_phrase.to_pretty_xml()}""")
produces just what I want to send to the model 🫠
I haven't spent much time looking at multilanguage support, this seems like a good opportunity to track down any issues. I'll dig in soon, thanks for the report!
I have the following toy example:
Unfortunately, the current implementation of
.to_pretty_xml()
escapes non-English characters and results in'<phrase>こんにちは、お元気にお過ごしでしょうか。</phrase>'
, which degrades the performance of some LLMs. This fact seems to force using something likePhrase.xml_start_tag()+example_phrase.phrase+Phrase.xml_end_tag()
instead ofexample_phrase.to_pretty_xml()
for simple examples in multilanguage setting and encourages to avoid complex queries (where rigging absolutely shines, from my experience with English models).Did I miss how to do it in the right way? Or if I didn't miss, are there any plans to add multilanguage support?
Anyway, thank you for this amazing project!