Support/preserve escaped char strings (such as "lowercase" letters)

Liquidream commented 7 years ago

Hi,

I am using the below helper function to display text using the secret(?) "lowercase" font in PICO-8.

However, it appears that luamin output converts the escaped sequence of characters to their UPPERCASE equivalents (and some other changes as well)?

Orig Code:

function smallcaps(s)
    local d=""
    local l,c,t=false,false
    for i=1,#s do
        local a=sub(s,i,i)
        if a=="^" then
            if(c) then d=d..a end
                c=not c
            elseif a=="~" then
                if(t) then d=d..a end
                t,l=not t,not l
            else 
                if c==l and a>="a" and a<="z" then
                for j=1,26 do
                    if a==sub("abcdefghijklmnopqrstuvwxyz",j,j) then
                        a=sub("\65\66\67\68\69\70\71\72\73\74\75\76\77\78\79\80\81\82\83\84\85\86\87\88\89\90\91\92",j,j)
                    break
                    end
                end
            end
            d=d..a
            c,t=false,false
        end
    end
    return d
end

Minified Code:

function ia(hh) local e=""local id,kf,kg=false,false for jo=1,#hh do local kh=sub(hh,jo,jo) if kh=="^"then
if(kf) then e=e..kh end
kf=not kf elseif kh=="~"then if(kg) then e=e..kh end
kg,id=not kg,not id else if kf==id and kh>="a"and kh<="z"then
for ki=1,26 do if kh==sub("abcdefghijklmnopqrstuvwxyz",ki,ki) then
kh=sub("ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\",ki,ki) break end end end e=e..kh kf,kg=false,false end end return e end

You can see the output has converted one of the code strings to: ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\

It's not a major issue, as I can just keep copy+pasting the orig string back over the minified output. But it is a bit of a pain :wink:

Thanks

dansanderson commented 7 years ago

Accepted.

Picotool's parser exposes string literals to Python as their string values, so "\65" is a one-character string of byte 65 = "A". When Picotool writes it back out, it assumes it's equivalent to write "A", but Pico-8's reader tries to "help" by converting "A" to "a" under the assumption that you used an external editor and habitually used uppercase letters.

One fix would be to change the writing of string literal tokens so that everything outside the Pico-8 63 character set is written with Pico-8 string escape codes instead of bytes. This has the disadvantage that if the original cart was written with an external editor and the author's intent was for letters in strings to be converted to lowercase automatically, they won't be, i.e. if I typed "A" it'll become "\65" when I may have meant "a". But that's probably an acceptable tradeoff.

Feneric commented 6 years ago

This is one of the fixes provided in my pull request. If you get the chance, please try it and see if it works for you.

dansanderson commented 3 years ago

This issue is partially resolved by 73619f767a7ab695b0e2cccf45fefa4efdc4e1e0.

All 256 characters in the P8SCII character set can be represented in a string literal with numeric escape codes. All but three of these can be represented another way: as the literal P8SCII character, or as a special escape sequence. The remaining three must be numeric escapes: \0, \14, and \15. As I mentioned in my previous comment, picotool parses string literals into P8SCII bytestrings internally. When it does so, it loses knowledge of how the characters were represented in the original code. picotool reconstructs the string literal its own way, and as of the most recent change it does so accurately for all 256 P8SCII characters using special escape sequences for codes 0-15 and literal P8SCII characters for 16-255.

This is impolite behavior because it modifies the user's intent without permission, but it's especially impolite due to a usability gap in how PICO-8 handles lower ASCII letters. PICO-8 pretends to have a single-case typeface due to its low resolution and generally trying to look like a vintage computer. When you type letters into the code editor, PICO-8 emits upper ASCII letters, and they appear uppercase. When writing a .p8 file, PICO-8 converts these to lowercase ASCII so when you manipulate the file in an external editor you're not having to read or type in all caps. When you type in uppercase in an external editor, PICO-8 swaps these for lower ASCII letters, which appear in a small-caps typeface in PICO-8.

You can type most P8SCII character literals in the PICO-8 code editor, including most ASCII, the typeable symbol range, and Japanese characters via the host OS's character entry method. You can also copy-paste these characters from an external document as the Unicode equivalents of the P8SCII characters (see wiki: P8SCII). This is not true of lower ASCII letters. They cannot be typed in the code editor: Shift + a letter produces a typeable symbol, not a letter. They also cannot be pasted into the code editor from an external document: PICO-8 assumes you intend the letters to be in the single-case typeface and converts lower ASCII letters to upper ASCII letters on paste. (Copy-paste within the PICO-8 code editor preserves all 240 printable characters, but you'd need an external editor to get them there in the first place.) There is no way to enter a literal lower ASCII letter in the PICO-8 editor. You can only use numeric escape codes in a string literal (\65 through \90).

Technically, picotool's current behavior does not modify the behavior of the code: converting from numeric escape to a P8SCII character in a string literal shouldn't change behavior. (@Liquidream your small caps conversion routine should work in its transformed state, as far as I can tell.) But it's especially rude to do this with lower ASCII letters, because it's converting from the only way to type them into the PICO-8 editor (numeric escapes) to a way that cannot be typed (P8SCII literals). A more polite solution would be for TokString to remember the original representation. When it has to write out its value again, it can compare the original representation to its current value. If they are the same, it should emit the original representation, instead of a reinterpretation of the value. (If they differ, as when some fancy tool updates string literal values on the AST node, it's safe to reinterpret. As far as I know nobody has used this feature, but it covers the intent of the original API.) https://github.com/dansanderson/picotool/blob/master/pico8/lua/lexer.py#L136

I'm inclined to not fix this further for now. (It probably took me longer to write this explanation than to just implement the TokString change. :) ) This only affects carts using small caps, and only results in a cosmetic change to the code (albeit an important one for those users). Please let me know if I've got any of this wrong, or if I'm missing a use case.

Liquidream commented 3 years ago

I can confirm that I no longer seem to need to restore the codes myself. Latest picotool (at least the one I tried the other day) preserved the small-caps). Thanks!

dansanderson / picotool

Support/preserve escaped char strings (such as "lowercase" letters) #19