fadden / 6502bench

A workbench for developing 6502 code
https://6502bench.com/
Apache License 2.0
168 stars 32 forks source link

HiBit terminated strings #102

Closed BacchusFLT closed 3 years ago

BacchusFLT commented 3 years ago

Either keeping track of the length or using zero termination would be the most common types of strings as per my experience. There is also another one and that is strings with bit 7 set.

See the pseudo opcode .shift - http://turbo.style64.org/docs/turbo-macro-pro-tmpx-syntax

image

The assumed handling of these would be to list the block of strings with a line break after each of the characters with bit 7 set.

As it is now, it becomes quite unreadable.

image

Please also min that in the example, the block is zero terminated.

fadden commented 3 years ago

It's called "Dextral Character Inverted" (DCI) in the string format UI. (https://retrocomputing.stackexchange.com/q/20463/56) A couple of Apple II assemblers called it that, and I haven't heard a better name. (You're not "shifting" the characters, except maybe in the C64 character set sense.)

You should be able to format the whole set in bulk. Don't select the $00 since that's not part of the DCI chunk.

fadden commented 3 years ago

If you select +00250e through +00856e (97 bytes), the data operand editor shows 21 DCI strings.

+00250c 850c: c0                    .....                 .dd1  $c0
+00250d 850d: a6                    .....                 .dd1  $a6
+00250e 850e: 4f 4c c4              .....                 .dstr pet:“old”
+002511 8511: 44 45 4c 45+          .....                 .dstr pet:“delete”
+002517 8517: 4c 49 4e 45+          .....                 .dstr pet:“linesave”
+00251f 851f: 4d 45 52 47+          .....                 .dstr pet:“merge”
+002524 8524: 41 55 54 cf           .....                 .dstr pet:“auto”
+002528 8528: 4d 4f 4e 49+          .....                 .dstr pet:“monitor”
+00252f 852f: 41 50 50 45+          .....                 .dstr pet:“append”
+002535 8535: 43 4f 50 d9           .....                 .dstr pet:“copy”
+002539 8539: 42 4f 4f d4           .....                 .dstr pet:“boot”
+00253d 853d: 5a 41 d0              .....                 .dstr pet:“zap”
+002540 8540: 42 41 43 4b+          .....                 .dstr pet:“backup”
+002546 8546: 50 4c 49 53+          .....                 .dstr pet:“plist"”
+00254c 854c: 53 4c 49 53+          .....                 .dstr pet:“slist"”
+002552 8552: 4f ce                 .....                 .dstr pet:“on”
+002554 8554: 4f 46 c6              .....                 .dstr pet:“off”
+002557 8557: 46 49 4e c4           .....                 .dstr pet:“find”
+00255b 855b: 52 45 4e 55+          .....                 .dstr pet:“renum”
+002560 8560: 49 4e 46 cf           .....                 .dstr pet:“info”
+002564 8564: 54 41 53 d3           .....                 .dstr pet:“tass”
+002568 8568: 4e 45 d4              .....                 .dstr pet:“net”
+00256b 856b: 54 4f 4f cc           .....                 .dstr pet:“tool”
+00256f 856f: 00                    .....                 .dd1  $00

If you select the 64tass pseudo-op set in the app settings, the listing will show them as ".shift" instead of ".dstr". I don't think ACME or cc65 has a pseudo-op for these.

BacchusFLT commented 3 years ago

That also worked fine - thanks!

The project I use as a sample is a utility cartridge. It also features a number of "one byte long strings", and they naturally then also have bit 7 set. The program restricts me from selecting them. Formally the table starts from 8507, starting with seven bytes that are seven one byte strings with bit 7 set.

BasicCommands .dd1 $af .dd1 $a4 .dd1 $a5 .dd1 $de .dd1 $dc .dd1 $c0 .dd1 $a6 .dstr pet:“old” .dstr pet:“delete” .dstr pet:“linesave” .dstr pet:“merge” .dstr pet:“auto” .dstr pet:“monitor” .dstr pet:“append” .dstr pet:“copy” .dstr pet:“boot” .dstr pet:“zap” .dstr pet:“backup” .dstr pet:“plist"” .dstr pet:“slist"” .dstr pet:“on” .dstr pet:“off” .dstr pet:“find” .dstr pet:“renum” .dstr pet:“info” .dstr pet:“tass” .dstr pet:“net” .dstr pet:“tool”

/Pontus Berg Bergatrollet AB CEO & Owner Tel/SMS: +46 735 082860 www.bergatrollet.se

Den mån 2 aug. 2021 kl 05:21 skrev Andy McFadden @.***>:

If you select +00250e through +00856e (97 bytes), the data operand editor shows 21 DCI strings.

+00250c 850c: c0 ..... .dd1 $c0

+00250d 850d: a6 ..... .dd1 $a6

+00250e 850e: 4f 4c c4 ..... .dstr pet:“old”

+002511 8511: 44 45 4c 45+ ..... .dstr pet:“delete”

+002517 8517: 4c 49 4e 45+ ..... .dstr pet:“linesave”

+00251f 851f: 4d 45 52 47+ ..... .dstr pet:“merge”

+002524 8524: 41 55 54 cf ..... .dstr pet:“auto”

+002528 8528: 4d 4f 4e 49+ ..... .dstr pet:“monitor”

+00252f 852f: 41 50 50 45+ ..... .dstr pet:“append”

+002535 8535: 43 4f 50 d9 ..... .dstr pet:“copy”

+002539 8539: 42 4f 4f d4 ..... .dstr pet:“boot”

+00253d 853d: 5a 41 d0 ..... .dstr pet:“zap”

+002540 8540: 42 41 43 4b+ ..... .dstr pet:“backup”

+002546 8546: 50 4c 49 53+ ..... .dstr pet:“plist"”

+00254c 854c: 53 4c 49 53+ ..... .dstr pet:“slist"”

+002552 8552: 4f ce ..... .dstr pet:“on”

+002554 8554: 4f 46 c6 ..... .dstr pet:“off”

+002557 8557: 46 49 4e c4 ..... .dstr pet:“find”

+00255b 855b: 52 45 4e 55+ ..... .dstr pet:“renum”

+002560 8560: 49 4e 46 cf ..... .dstr pet:“info”

+002564 8564: 54 41 53 d3 ..... .dstr pet:“tass”

+002568 8568: 4e 45 d4 ..... .dstr pet:“net”

+00256b 856b: 54 4f 4f cc ..... .dstr pet:“tool”

+00256f 856f: 00 ..... .dd1 $00

If you select the 64tass pseudo-op set in the app settings, the listing will show them as ".shift" instead of ".dstr". I don't think ACME or cc65 has a pseudo-op for these.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fadden/6502bench/issues/102#issuecomment-890682504, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGZWZSSBEMMMOVZSDB2DNODT2YFK3ANCNFSM5BLDOPAQ .

fadden commented 3 years ago

That's intentional -- with only one character, it's not a DCI string, just a character. (You can't select a bunch of $00 and declare them to be a list of zero-length null-terminated strings or L1 strings.) However, I see the problem: with PETSCII the UI won't let you mark it as a character, because it's outside the normal range. (For ASCII text you can select "high ASCII", which is very common on the Apple II.)

So either we need to allow single-character DCI "strings" for PETSCII, or the UI needs to be more generous about accepting PETSCII characters with the high bit set. Hmm.

BacchusFLT commented 3 years ago

We can get into the philosophical question if a string cannot also be a character, but let's stay clear of that one ;-)

I did notice that you also have a threshold for string length in the project property, but setting that to "Non (Disable)" didn't change anything. Given that you already support a string length threshold, why not also make it applicable for this case?

Den mån 2 aug. 2021 kl 16:43 skrev Andy McFadden @.***>:

That's intentional -- with only one character, it's not a DCI string, just a character. (You can't select a bunch of $00 and declare them to be a list of zero-length null-terminated strings or L1 strings.) However, I see the problem: with PETSCII the UI won't let you mark it as a character, because it's outside the normal range. (For ASCII text you can select "high ASCII", which is very common on the Apple II.)

So either we need to allow single-character DCI "strings" for PETSCII, or the UI needs to be more generous about accepting PETSCII characters with the high bit set. Hmm.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fadden/6502bench/issues/102#issuecomment-891084476, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGZWZSUUDTRQC4VVLQGOADDT22VIRANCNFSM5BLDOPAQ .

fadden commented 3 years ago

The value in the project properties is for the automatic string-finder in the data scanner. The minimum length of a DCI string is motivated by technical reasons for low vs. high ASCII strings, because you can have two different kinds of DCI: "lo lo lo hi" or "hi hi hi lo". In either case the high bit on the last byte is different, but you need to have two bytes to tell that it's a DCI string... "hi hi hi hi" could be four 1-byte DCI strings, or one 4-byte high-ASCII string.

With PETSCII there's only one mode (lo ... hi), so "hi hi hi hi" is unambiguously four strings.

I don't think there's any reason to block 1-byte DCI strings regardless of character set, but I need to dig around a bit and make sure all the low vs. high auto-detection stuff will work right.

BacchusFLT commented 3 years ago

Many thanks and eagerly awaiting the results.

Den mån 2 aug. 2021 20:47Andy McFadden @.***> skrev:

The value in the project properties is for the automatic string-finder in the data scanner. The minimum length of a DCI string is motivated by technical reasons for low vs. high ASCII strings, because you can have two different kinds of DCI: "lo lo lo hi" or "hi hi hi lo". In either case the high bit on the last byte is different, but you need to have two bytes to tell that it's a DCI string... "hi hi hi hi" could be four 1-byte DCI strings, or one 4-byte high-ASCII string.

With PETSCII there's only one mode (lo ... hi), so "hi hi hi hi" is unambiguously four strings.

I don't think there's any reason to block 1-byte DCI strings regardless of character set, but I need to dig around a bit and make sure all the low vs. high auto-detection stuff will work right.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fadden/6502bench/issues/102#issuecomment-891250693, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGZWZSTFWG4FR6YXIOMGJK3T23R3ZANCNFSM5BLDOPAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

fadden commented 3 years ago

Fixed in https://github.com/fadden/6502bench/releases/tag/v1.7.5-dev4

fadden commented 3 years ago

Unhandled edge case: non-ASCII or string-delimiter char in last byte. It gets emitted as raw value, with high bit set, which 64tass doesn't like:

        .shift  $de
        .shift  $dc
        .shift  "plist",$a2
        .shift  "slist",$a2
fadden commented 3 years ago

That's better:

850a: de                           .dstr pet:$5e
850b: dc                           .dstr pet:$5c
8546: 50 4c 49 53+                 .dstr pet:“plist"”
854c: 53 4c 49 53+                 .dstr pet:“slist"”
        .shift  $5e
        .shift  $5c
        .shift  "plist",$22
        .shift  "slist",$22
BacchusFLT commented 3 years ago

Looks fully correct!

/Pontus Berg Bergatrollet AB CEO & Owner Tel/SMS: +46 735 082860 www.bergatrollet.se

Den tis 10 aug. 2021 kl 23:20 skrev Andy McFadden @.***

:

That's better:

850a: de .dstr pet:$5e

850b: dc .dstr pet:$5c

8546: 50 4c 49 53+ .dstr pet:“plist"”

854c: 53 4c 49 53+ .dstr pet:“slist"”

    .shift  $5e

    .shift  $5c

    .shift  "plist",$22

    .shift  "slist",$22

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fadden/6502bench/issues/102#issuecomment-896323842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGZWZSWJPLL5CXX6FAZVP63T4GJ3NANCNFSM5BLDOPAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

fadden commented 3 years ago

Fixed in v1.7.5.