FireFox2000000 / Moonscraper-Chart-Editor

BSD 3-Clause "New" or "Revised" License
227 stars 61 forks source link

Support for Special Characters (i.e. -, ", Unicode) #44

Open SpookySquidward opened 3 years ago

SpookySquidward commented 3 years ago

(This issue is in response to pull request #43)

The Value of Special Characters

In Moonscraper and the .chart file type, certain characters should not currently be used in global or track events. These most notably include the hyphen (-) and plain quotation marks ("), though I will also mention Unicode characters later. These characters are illegal because of how a lyric event is stored in the .chart file format:

<lyric tick> = E "lyric hello!"

Or more commonly, with syllables separated:

<lyric tick 1> = E "lyric Hel-"
<lyric tick 2> = E "lyric lo!"

The quotation marks are used to encapsulate the content of the events, which in this case are lyric Hel- and lyric lo!; they are always removed by Moonscraper when importing a .chart file. And the hyphens are used to separate syllables in a word, as in lyric Hel-; they are removed by the program playing a .chart (or .mid) file and are always removed by Clone Hero. If you want to use them in your lyrics to add style, currently you need to find a workaround, like using two apostrophes ('') instead of proper quotation marks; or, you can simply avoid using these characters altogether.

Potential Solutions

1. Parse More Intelligently

Take, for example, the following lyric event:

960 = E "lyric "What?""

We want our lyric event do display as "What?", but instead we simply get What?. Where did our quotes go? Well, we need to remember that quotation marks are automatically removed when our chart file is read. So our reading program (such as Clone Hero) only sees this:

960 = E lyric What?

It is not surprising, then, that our quotes have disappeared. Notice, however, that we have enough information to figure out exactly where we should and should not have quotation marks. Let's simply remove the first and last quotation mark in our event:

960 = E "lyric "What?"" becomes 960 = E lyric "What?"

We can break our line down into parts (960, =, E, lyric, and "What?"), and we will have our event back just as we typed it.

A similar process can be used for hyphens, where we only remove the last hyphen in a lyric event:

768 = E "lyric out--"
816 = E "lyric of--"
864 = E "lyric this--"
912 = E "lyric world"

can give us the syllables out-, of-, this-, and world without losing the information that these events are syllables.

Technical Note Quotation marks are actually legal characters in Clone Hero in the .mid file format, and they are correctly saved to both the .chart and .mid formats from Moonscraper. However, if you reopen a .chart file that has extra quotation marks, they will be removed by Moonscraper and will need to be retyped. Hyphens are also saved correctly to the .chart and .mid formats, but they are currently removed in Clone Hero.

2. Add Escape Characters

Escape characters give us another solution to this problem by letting us explicitly say that we want to use a character as-is. In C#, for example, the backslash () is an escape character. \" gets interpreted as an apostrophe, \n is a newline character, and \\ codes for the backslash itself. We could use a similar system in the .chart specification, where \- would code for a dash, \" for a quotation mark, and \\ for a backslash. This is a more robust solution than character substitution, such as using = to mean -; we can now use hyphens in our lyrics, but we can't use equal signs, so we have just shifted the problem.

3. Use Other Characters Instead

This is not one of my preferred solutions, but it is worth mentioning because it's what many charters are currently doing. Instead of using plain quotation marks, we can instead using opening (“, U+201C) and closing (”U+201D) quotation marks to accomplish a similar style. For hyphens, we can substitute the similar en dash (–, U+2013). These characters are distinct from the hyphen (-, U+002D) and plain quotation marks (", U+0022), so they aren't parsed out when reading the file and should display as-expected.

Not only does this approach still prevent you from using the original characters you wanted to, it also assumes that every program that will read a chart file understands Unicode characters; if a program doesn't accept Unicode characters, the effects can range from unexpected to totally game-breaking. Moonscraper doesn't support Unicode fully, as shown below:

Unicode characters shown as square outlines

While Unicode could reasonably be implemented into newer applications, it is not likely to make its way into old code. A Unicode implementation would also be incomplete without giving its characters more real use by supporting multiple languages for a song, a feature that is beyond the scope of this issue.

Verdict

I am of the opinion that Moonscraper will eventually need to support special characters better than it currently does. While an implementation of any of the above solutions comes with its challenges, it will be better for the community in the long run if special characters become officially supported in Moonscraper and the .chart file format.

FireFox2000000 commented 3 years ago

So for hyphens (-), Clone Hero will substitute any equals sign characters (=) with proper hyphens. This most likely is due to hyphens having actual gameplay effects in other games like Rock Band etc. Technically it has nothing to do with the actual .chart format on that one.

Current solution I'm running with is to add an export option to swap out these special chars for the correct CH equivalent as well as to save the chart in a different format in the case that a lyric or an event isn't compatible with the .chart format and to warn users to finalise it via the exporter to make it playable.

Moonscraper does technically support unicode already, it's just an issue with the font/font atlus being used not having an actual glyph registered. Supporting every possible glyph is gonna be too much work to really be worth it, but it still technically writes everything into the save files correctly.

TheNathannator commented 3 years ago

old issue, but I still wish to add info:

Hyphens aren't the only character that get stripped out in CH, but it doesn't strip them out for no reason. Either they're used in Rock Band charts and need to be stripped out in order to display those charts correctly, or they have some other function.

Other special characters work just fine.

Also, the CH Public Test Build doesn't strip out quotation marks in .chart like v.23 does. You can use them there and they'll show up properly. The PTB can also properly parse TextMeshPro formatting tags, though it strips out any that don't match a whitelist. This means that anything between <angle brackets> that does not match this list will be stripped out, including the brackets.