UltraStar-Deluxe / format

UltraStar Format Specification
https://usdx.eu/format
MIT License
11 stars 2 forks source link

[Spec] Specify if UTF-8 encoding should have BOM #42

Closed Tuupertunut closed 9 months ago

Tuupertunut commented 9 months ago

Suggestion

Specify in the Ultrastar format specification whether files should use UTF-8 BOM or UTF-8 without BOM.

Use case

The Ultrastar format specification speficies that encoding should be UTF-8, but does not distinguish between UTF-8 BOM and UTF-8 without BOM. There are currently tools in the Ultrastar ecosystem that accept or produce only one of these.

Examples:

Therefore if you save a song in Ultrastar Play song editor and try to open it in Performous Composer, it will fail to open, even though both are UTF-8.

Extra info/examples/attachments

No response

Baklap4 commented 9 months ago

as far as previous discussions assembled i think we came onto without bom

bohning commented 9 months ago

BOM does only really make sense for UTF16 and UTF32. It was only introduced by Vocaluxe afaik to detect UTF8 and differentiate CP1252 (as alternative to the Vocaluxe-specific #ENCODING tag). The Vocaluxe reasoning is: if there is no #ENCODING tag, it is UTF8 if there is a BOM, otherwise it's CP1252. But there are definitely better ways to implement encoding detection and if the standard proposes UTF8, it should not even be necessary anymore to detect encodings.

I strongly suggest that some Vocaluxe developers change the logic to default to UTF8 (and use other encodings via the #ENCODING tag, as long as anything other than UTF8 is not deprecated yet).

basisbit commented 9 months ago

I also vote for (and strongly suggest) UTF-8 without BOM. There is no need for it.

Baklap4 commented 9 months ago

@marwin89 can you make a pr to add this :)?

marwin89 commented 9 months ago

Here is the pull request. Please approve and merge 👋

FYI: there is an estimated issue for implementing support for UTF-8 (without BOM) in vocaluxe repository

I close this issue. thanks @Tuupertunut for discussion and refining the spec.

codello commented 4 months ago

The current version of the formal specification has a slightly more relaxed phrasing wrt to the BOM, acknowledging that applications may ignore a BOM if one is present. Is this phrasing in line with the result of this discussion?

https://github.com/UltraStar-Deluxe/format/blob/23bf9307609f8320b6c6dd13cfef75b95ba33e6e/spec.md?plain=1#L64-L67