Closed JoshLee0915 closed 5 months ago
Just a passerby (I have no idea what BOM is) but I looked at the official spec and it has this to say
File Format Yarn files must be a UTF-8 text file and should not have the BOM set. File extension should be .yarn.
Ah, thanks! I was not aware of that doc (its been a bit since I worked with Yarn Spinner). In that case the code checking to ensure the file is UTF-8 just needs to be updated to also check that it does not have a BOM
Yeah sounds reasonable :)
@stargazing-dino thanks for answering the question :) FYI, a BOM (Byte Order Mark) is a magic number at the very beginning of a file to say whether a UTF-16 or UTF-32 file uses little or big endian. Software that only accepts UTF-8 usually uses the BOM as a cheap detection to see if the input is in a wrong format.
Oh shoot, I forgot the main reason I asked how best to handle this. The bad yarn file in it is from the Yarn-Godot repo (the VN example file to be exact) which uses the same lib as the unity plugin does. It does not mess with them bc C# strips the BOM from strings when they are read in, but Rust does not: https://github.com/rust-lang/rfcs/issues/2428
So, while their docs say UTF-8 only its not a hard requirement and the core Yarn Lib does accept files with and without the BOM. To maintain parity with the library we should actually be stripping the BOM out of the string when detected.
@JoshLee0915 Ooooh, I see. Good point!
Overview
I am working on a GDExtension for Yarn using the Rust lib since it looks to be more actively maintained and supported then the C++ one in the unreal project. While working on it I was unable to get some files to compile. I tracked it to if the yarn file is encoded with UTF-8 with BOM.
If a BOM is present in the file the compile would fail with either a panic:
[panic] called 'Option::unwrap()' on a 'None' value
or a compile error ofMissing title header for node
.I created a branch for the error here with two test yarn files.
HellWorld.yarn
will throw the compile error if set with UTF-8 with BOM andVNExampleDialogue.yarn
throws the panic.https://github.com/JoshLee0915/YarnSpinner-GDExtension/tree/bug/utf-8-bom-issue
Steps to replicate:
compilation_type
:FullCompilation
b.library
:Library::standard_library()
read_file
to load the file with the UTF-8 with BOM encodingcompile
to compile the yarn program. This should either cause a panic or throw a compile errorIt seems like it should be a simple error to fix and would be happy to do it myself but would like to know first if its preferred to be throwing an error like with if you try to load a UTF-16 encoding or if we should just convert it to UTF-8 with no BOM on the fly if it is detected?