Format Address tables outside the binary

fadden / 6502bench

A workbench for developing 6502 code

https://6502bench.com/

Apache License 2.0

169 stars 32 forks source link

Format Address tables outside the binary #130

Closed BacchusFLT closed 2 years ago

BacchusFLT commented 2 years ago

The current project I work with is a c64 paint program. The binary is $0800 to $bfff (thereabout), It uses $2000 for the bitmap (inside the scope of the binary), $0400 is the screen memory (which is one of the colour matrixes in c64 multicolour bitmap mode) and the colour memory is at the hardcoded address of $d800. The program I am disassembling has table to point to the first address of every row of the bitmap, one table for the screen and one for the colour memory.

Making an Address table of this is perfectly fine as it is pointing to a segment of the actual binary:

Bitmap_Lo $00,$40,$80 ... Bitmap_Hi $20,$21,$22 ...

However the two other ones will not accept this:

Screen_Lo $00,$28,$50 ... Screen_Hi $04,$04,$04

Colour_Lo $00,$28,$50 ... Colour_Hi $d8,$d8,$d8

I take it as they are outside the scope of the binary. The references inside the scope of the binary turn to Labels, but if outside, isn't it "just" to make them Project symbols? (I know the use of "just" is totally unfair as suggesting is so easy and implementing could be a nuance, so pardon the use of the word).

fadden commented 2 years ago

I'm going to address this in two parts: how external addresses could be handled by the address table formatter, and how I think you should handle this specific situation with the current implementation.

The Format Address Table feature only works with "internal" addresses. Part of the reason for this is that, if you've set it up wrong (e.g. the low/high bytes are reversed), you'll get a bunch of incorrect addresses, and clicking "OK" will create tons of junk in your project symbol table. The other part is that sometimes tables of addresses have "special" entries in them (like $0000 or $ff00) that tell the code to do something special. Creating labels for those would be wrong, and working around the "junk" entries would be annoying.

One possibility would be to have the formatter look for existing labels to match to. External locations are generally either known addresses (like ROM entry points) or memory regions with an explicit size, so having those declared in the project symbol table or in a .SYM65 file would make sense. This would allow the formatter to apply external labels to table entries while avoiding the problems associated with auto-generating external symbols.

Let's consider a chunk of memory that looks like this:

4af8: 00 28 50 78+ ColMemLo        .bulk $00,$28,$50,$78,$a0,$c8,$f0,$18,$40,$68,$90,$b8,$e0,$08,$30,$58
                                    +    $80,$a8,$d0,$f8,$20,$48,$70,$98,$c0
4b11: f8 20 48                     .bulk $f8,$20,$48
4b14: d8 d8 d8 d8+ ColMemHi        .bulk $d8,$d8,$d8,$d8,$d8,$d8,$d8,$d9,$d9,$d9,$d9,$d9,$d9,$da,$da,$da
                                    +    $da,$da,$da,$da,$db,$db,$db,$db,$db
4b2d: 0d 0e 0e                     .bulk $0d,$0e,$0e

25 sets of pointers to a contiguous range of 40-byte entries, spanning $d800-dbe7. (Followed by another 3 bytes of pointers to $0df8/0e20/$0e48, which I'm ignoring.) Start by creating a project symbol "ColorMemArea" at $d800 with a width of 25*40=1000 (hit F6, click New Symbol):

bug130-1

Save that and go back to the code. Select the ColMemLo line, then Edit Operand. Set the format to Single Bytes, and in the Symbolic Reference field type "ColorMemArea". Make sure the "Low" radio button is selected.

bug130-2

Do the same on the ColMemHi line, but this time select the "High" radio button.

You should end up with:

4af8: 00           ColMemLo        .dd1  <ColorMemArea
4af9: 28                           .dd1  <ColorMemArea+40
4afa: 50                           .dd1  <ColorMemArea+80
4afb: 78                           .dd1  <ColorMemArea+120
4afc: a0                           .dd1  <ColorMemArea+160
[...]
4b10: c0                           .dd1  <ColorMemArea+192
4b11: f8 20 48                     .bulk $f8,$20,$48
4b14: d8           ColMemHi        .dd1  >ColorMemArea
4b15: d8                           .dd1  >ColorMemArea
4b16: d8                           .dd1  >ColorMemArea
4b17: d8                           .dd1  >ColorMemArea
4b18: d8                           .dd1  >ColorMemArea
4b19: d8                           .dd1  >ColorMemArea
4b1a: d8                           .dd1  >ColorMemArea
4b1b: d9                           .dd1  >ColorMemArea+1
[...]

(The default display settings will show that last line as (>ColorMemArea)+1. If you prefer the cc65 syntax shown above, Edit > Settings > Display Format, change Expression Style to cc65, or just use the Quick Set at the bottom to configure the values in that tab for cc65 format. The difference is because the operator precedence rules in cc65 allow the expression to be written without parenthesis, and SourceGen tries to output the simplest form possible.)

FWIW, you don't actually need to specify a width on the ColorMemArea to make this work, because you're setting the symbol explicitly, but if you format a pointer somewhere else as an address then having the range would allow it to be resolved automatically. If Format Address Table were changed to look for external addresses, it would also use the range to know that it should use that symbol for all entries.

Use the same approach for the screen pointers... define the region (addr=$0400, len=1000), then format the pointer table bytes as low/high parts of a pointer to a single symbol.

If you actually want separate symbols for each address, you will have to do a lot of typing. You can create the symbols pretty quickly in a .SYM65 file with a text editor, but you'd have to reference them manually on every byte with the data operand editor. If Format Address Table matched against existing external address symbols, you'd still have to do the first (easy) part, but the second (painful) part would be automated.

BacchusFLT commented 2 years ago

Many thanks - that was an option I wasn't aware of.

The main topic

I must however say it's a bit of a "dirty work-around".

If adding project symbols automatically is adding them statically, and thereby cluttering the table, seems like a nuance of course. But to me that should be much like the automatically generated labels. They should only live if there is a reference, but it seems the project labels work differently.- if added they would be there until you manually delete them. Isn't it a reasonable though to have the approach of being able to add them "dynamically" to the Project Symbols, like the labels assigned to internal addresses? One part of the Project Symbols the program generates internally.

Also; the approach you suggested doesn't work fully. Whereas the high byte is properly incremented in the HighByte table, the lowbytes are always referencing the first page. Item 8 says ScreenMem+24, but the correct reference is ScreenMem+280. Only if properly associated with the highbyte will it be correct. One could argue that the assembly will yield the correct result, but it really doesn't feel right.

Point_ScreenLow .dd1 <ScreenMem .dd1 <ScreenMem+40 .dd1 <ScreenMem+80 .dd1 <ScreenMem+120 .dd1 <ScreenMem+160 .dd1 <ScreenMem+200 .dd1 <ScreenMem+240 .dd1 <ScreenMem+24

Having said this, the HighByte

Point_ScreenHigh .dd1 >ScreenMem .dd1 >ScreenMem .dd1 >ScreenMem .dd1 >ScreenMem .dd1 >ScreenMem .dd1 >ScreenMem .dd1 >ScreenMem .dd1 (>ScreenMem)+1

My view: There are two ways to reference this.

1) One is the way it's done internally, as per the Bitmap table in the example we have discussed:

Point_BitmapLow .dd1 <T2000 .dd1 <T2140 .dd1 <T2280 .dd1 <T23C0 .dd1 <T2500 .dd1 <T2640

Point_BitmapHigh .dd1 >T2000 .dd1 >T2140 .dd1 >T2280 .dd1 >T23C0 .dd1 >T2500 .dd1 >T2640 .dd1 >T2780

2) The other is a relative

Point_ScreenLow .dd1 <ScreenMem .dd1 <ScreenMem+$28 .dd1 <ScreenMem+$50

Point_ScreenHigh .dd1 >ScreenMem .dd1 >ScreenMem+$28 .dd1 >ScreenMem$50

The mode suggested means that the two bytes are unaware that it's part of a word, which means that they are formally wrong, even if the result assembles correct. In match class you got an error if you result was correct but the computation was wrong ;-)

A second part of the same topic;

[image: image.png]

The program I look at is a paint program and I guess it used the three last entries of the bitmap, screen and colourmemory to have a buffer for the area covered by the menu. If I select the last three bytes of the high and low table for the colour memory, the Generated Addresses are correct, the OK button is there but pressing OK nothing happens. These are locations inside the binary. One though would be that it couldn't handle hi/low that aren't adjacent, but the requester could at least detect them properly.

Den lör 7 maj 2022 18:54Andy McFadden @.***> skrev:

I'm going to address this in two parts: how external addresses could be handled by the address table formatter, and how I think you should handle this specific situation with the current implementation.

The Format Address Table feature only works with "internal" addresses. Part of the reason for this is that, if you've set it up wrong (e.g. the low/high bytes are reversed), you'll get a bunch of incorrect addresses, and clicking "OK" will create tons of junk in your project symbol table. The other part is that sometimes tables of addresses have "special" entries in them (like $0000 or $ff00) that tell the code to do something special. Creating labels for those would be wrong, and working around the "junk" entries would be annoying.

One possibility would be to have the formatter look for existing labels to match to. External locations are generally either known addresses (like ROM entry points) or memory regions with an explicit size, so having those declared in the project symbol table or in a .SYM65 file would make sense. This would allow the formatter to apply external labels to table entries while avoiding the problems associated with auto-generating external symbols.

Let's consider a chunk of memory that looks like this:

4af8: 00 28 50 78+ ColMemLo .bulk $00,$28,$50,$78,$a0,$c8,$f0,$18,$40,$68,$90,$b8,$e0,$08,$30,$58

$80,$a8,$d0,$f8,$20,$48,$70,$98,$c0 4b11: f8 20 48 .bulk $f8,$20,$48 4b14: d8 d8 d8 d8+ ColMemHi .bulk $d8,$d8,$d8,$d8,$d8,$d8,$d8,$d9,$d9,$d9,$d9,$d9,$d9,$da,$da,$da

$da,$da,$da,$da,$db,$db,$db,$db,$db 4b2d: 0d 0e 0e .bulk $0d,$0e,$0e

25 sets of pointers to a contiguous range of 40-byte entries, spanning $d800-dbe7. (Followed by another 3 bytes of pointers to $0df8/0e20/$0e48, which I'm ignoring.) Start by creating a project symbol "ColorMemArea" at $d800 with a width of 25*40=1000 (hit F6, click New Symbol):

[image: bug130-1] https://user-images.githubusercontent.com/6185696/167263133-d511b292-dc78-42f2-8113-7fad033367b5.png

Save that and go back to the code. Select the ColMemLo line, then Edit Operand. Set the format to Single Bytes, and in the Symbolic Reference field type "ColorMemArea". Make sure the "Low" radio button is selected.

[image: bug130-2] https://user-images.githubusercontent.com/6185696/167263137-376d9bd1-6578-4856-94c6-e631b6d0097c.png

Do the same on the ColMemHi line, but this time select the "High" radio button.

You should end up with:

4af8: 00 ColMemLo .dd1 <ColorMemArea 4af9: 28 .dd1 <ColorMemArea+40 4afa: 50 .dd1 <ColorMemArea+80 4afb: 78 .dd1 <ColorMemArea+120 4afc: a0 .dd1 <ColorMemArea+160 [...] 4b10: c0 .dd1 <ColorMemArea+192 4b11: f8 20 48 .bulk $f8,$20,$48 4b14: d8 ColMemHi .dd1 >ColorMemArea 4b15: d8 .dd1 >ColorMemArea 4b16: d8 .dd1 >ColorMemArea 4b17: d8 .dd1 >ColorMemArea 4b18: d8 .dd1 >ColorMemArea 4b19: d8 .dd1 >ColorMemArea 4b1a: d8 .dd1 >ColorMemArea 4b1b: d9 .dd1 >ColorMemArea+1 [...]

(The default display settings will show that last line as (>ColorMemArea)+1. If you prefer the cc65 syntax shown above, Edit > Settings > Display Format, change Expression Style to cc65, or just use the Quick Set at the bottom to configure the values in that tab for cc65 format. The difference is because the operator precedence rules in cc65 allow the expression to be written without parenthesis, and SourceGen tries to output the simplest form possible.)

FWIW, you don't actually need to specify a width on the ColorMemArea to make this work, because you're setting the symbol explicitly, but if you format a pointer somewhere else as an address then having the range would allow it to be resolved automatically. If Format Address Table were changed to look for external addresses, it would also use the range to know that it should use that symbol for all entries.

Use the same approach for the screen pointers... define the region (addr=$0400, len=1000), then format the pointer table bytes as low/high parts of a pointer to a single symbol.

If you actually want separate symbols for each address, you will have to do a lot of typing. You can create the symbols pretty quickly in a .SYM65 file with a text editor, but you'd have to reference them manually on every byte with the data operand editor. If Format Address Table matched against existing external address symbols, you'd still have to do the first (easy) part, but the second (painful) part would be automated.

— Reply to this email directly, view it on GitHub https://github.com/fadden/6502bench/issues/130#issuecomment-1120240448, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGZWZSXIETNYCF7MK7OYD5TVI2N3JANCNFSM5VKIKDBA . You are receiving this because you authored the thread.Message ID: @.***>

fadden commented 2 years ago

Whereas the high byte is properly incremented in the HighByte table, the lowbytes are always referencing the first page. Item 8 says ScreenMem+24, but the correct reference is ScreenMem+280.

If you try to write ".byte <label+280", cc65 will report Error: Range error because it's not an 8-bit value. You have to write ".byte <(label+280)", which seems kind of ugly, though this varies by assembler. In any event, the disassembler doesn't remember that it's part of a 16-bit value, so it doesn't have the option of doing it the other way.

Auto-generated project symbols could be a thing. You'd still edit them as project symbols, but doing so could "promote" them to regular project symbols, the same way it works when you edit an auto-generated label. Some amount of un-formatting might be needed to clean up situations where a project symbol wasn't actually wanted.

I don't think they'd help here though. You'll notice that the labels generated by the table formatter are "Txxx" rather than "Lxxx"; that's because they're permanent "user" labels rather than auto-labels. This is necessary because the disassembler doesn't remember that two bytes in two different places are meant to be combined to form a 16-bit address... the structure of the table isn't stored. (This is why you can't have FOO+$118 instead of FOO+$18... you need to know which high byte the $18 is supposed to be paired with, and the table formatter doesn't record this relationship.) Without that knowledge, we aren't able to auto-generate symbols, and so would need to generate permanent project symbols for each table entry. The situation could be different if the table were formed from a series of 16-bit values, since we could just mark those as address references, but I don't want to have different behavior in different situations.

So, to really be able to use auto-generated project symbols, we'd need to record the address table's structure in the disassembly. This can get messy if the user wants to edit parts of the table to be different, e.g. some entries have special values rather than address parts. Things are significantly simpler if table formatting is a one-time operation rather than one step in layers of changes that must be applied every time (like "flattening" changes in a photo editor vs. keeping multiple layers).

I think, to get the kind of symbols you want, we'd need to add a "generate external address labels" checkbox that would populate the project symbol table with symbols (maybe "ETxxx").

the OK button is there but pressing OK nothing happens

It looks like you sent an image but github ate it... I think you need to drag & drop images into the issue reporting mechanism to include them, e-mailing them doesn't work. However, I think I see the problem.

There's a button in the bottom-right corner of the SourceGen main window that says something like "40 messages (40 warning/error)". Click that if you're not already seeing the message window. In my copy of your project, the first few messages tell you that the labels you've generated are hidden and being ignored.

bug130-3

If you double-click the first message, it'll take you to this line:

0a00: 00 00 00 00+ L0A00           .ds   1728

The labels don't appear because there's a large explicitly-defined region that spans them. If you create labels and then define a region, the Data Operand Editor will carefully break the region into smaller pieces so the labels don't get swallowed. The Address Table Formatter works the other way around, though, so it's possible to create labels inside formatted regions. (It really ought to give you a less subtle warning when this happens.)

Select the line at $0a00 and use the operand editor to change the format back to Default. Your labels will appear and some of the warnings will vanish.

The other warnings are mostly about formatting that appears in the middle of instructions. It looks like some things that used to be formatted as single-byte data items turned into code. You can clean these up by double-clicking the warning and then using Actions > Remove Formatting on the line in question. (It also removes hidden labels, so don't use it on $0a00 or you'll clear the labels embedded in the .ds ... https://6502bench.com/sgmanual/mainwin.html#remove-formatting .)

BacchusFLT commented 2 years ago

I get a bit of the complexity, even if of course that without any insight of the wiring behind the scene, I well understand that this is just the tip of the iceberg. I wasn't aware that not all assemblers could to calculations with more than 256. That sounds utterly restrictive.

Key is that I got the message through and shared a real life case. If external tables solve it then I am all for it.

I do indeed have a number of labels hidden, that I will dig into following your suggestion. Need to be in front of the project to validate if that solves the situation.

Den sön 8 maj 2022 18:35Andy McFadden @.***> skrev:

Whereas the high byte is properly incremented in the HighByte table, the lowbytes are always referencing the first page. Item 8 says ScreenMem+24, but the correct reference is ScreenMem+280.

If you try to write ".byte <label+280", cc65 will report Error: Range error because it's not an 8-bit value. You have to write ".byte <(label+280)", which seems kind of ugly, though this varies by assembler. In any event, the disassembler doesn't remember that it's part of a 16-bit value, so it doesn't have the option of doing it the other way.

Auto-generated project symbols could be a thing. You'd still edit them as project symbols, but doing so could "promote" them to regular project symbols, the same way it works when you edit an auto-generated label. Some amount of un-formatting might be needed to clean up situations where a project symbol wasn't actually wanted.

I don't think they'd help here though. You'll notice that the labels generated by the table formatter are "Txxx" rather than "Lxxx"; that's because they're permanent "user" labels rather than auto-labels. This is necessary because the disassembler doesn't remember that two bytes in two different places are meant to be combined to form a 16-bit address... the structure of the table isn't stored. (This is why you can't have FOO+$118 instead of FOO+$18... you need to know which high byte the $18 is supposed to be paired with, and the table formatter doesn't record this relationship.) Without that knowledge, we aren't able to auto-generate symbols, and so would need to generate permanent project symbols for each table entry. The situation could be different if the table were formed from a series of 16-bit values, since we could just mark those as address references, but I don't want to have different behavior in different situations.

So, to really be able to use auto-generated project symbols, we'd need to record the address table's structure in the disassembly. This can get messy if the user wants to edit parts of the table to be different, e.g. some entries have special values rather than address parts. Things are significantly simpler if table formatting is a one-time operation rather than one step in layers of changes that must be applied every time (like "flattening" changes in a photo editor vs. keeping multiple layers).

I think, to get the kind of symbols you want, we'd need to add a "generate external address labels" checkbox that would populate the project symbol table with symbols (maybe "ETxxx").

the OK button is there but pressing OK nothing happens

It looks like you sent an image but github ate it... I think you need to drag & drop images into the issue reporting mechanism to include them, e-mailing them doesn't work. However, I think I see the problem.

There's a button in the bottom-right corner of the SourceGen main window that says something like "40 messages (40 warning/error)". Click that if you're not already seeing the message window. In my copy of your project, the first few messages tell you that the labels you've generated are hidden and being ignored.

[image: bug130-3] https://user-images.githubusercontent.com/6185696/167304997-4155c2d2-4821-4a2e-af1e-4718588287c8.png

If you double-click the first message, it'll take you to this line:

0a00: 00 00 00 00+ L0A00 .ds 1728

The labels don't appear because there's a large explicitly-defined region that spans them. If you create labels and then define a region, the Data Operand Editor will carefully break the region into smaller pieces so the labels don't get swallowed. The Address Table Formatter works the other way around, though, so it's possible to create labels inside formatted regions. (It really ought to give you a less subtle warning when this happens.)

Select the line at $0a00 and use the operand editor to change the format back to Default. Your labels will appear and some of the warnings will vanish.

The other warnings are mostly about formatting that appears in the middle of instructions. It looks like some things that used to be formatted as single-byte data items turned into code. You can clean these up by double-clicking the warning and then using Actions > Remove Formatting on the line in question. (It also removes hidden labels, so don't use it on $0a00 or you'll clear the labels embedded in the .ds ... https://6502bench.com/sgmanual/mainwin.html#remove-formatting .)

— Reply to this email directly, view it on GitHub https://github.com/fadden/6502bench/issues/130#issuecomment-1120449230, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGZWZSQ7SM5FAB5FNDTNIRLVI7UMPANCNFSM5VKIKDBA . You are receiving this because you authored the thread.Message ID: @.***>

fadden commented 2 years ago

Summing up the proposed changes to the Format Address Table feature:

Look for matches with platform/project symbols for external addresses. Show these in the preview, and generate symbolic references to them when the table is formatted.
Add another checkbox (yikes!) that causes new project symbols ("ETxxxx") to be generated for external addresses that don't have an associated symbol.
Show a warning (probably in the label preview window) when the labels we're about to generate would end up hidden inside an existing multi-byte data area. Maybe don't generate those labels?
- Need to explore how this interacts with the "tag as code start point" feature.

Edit: for external symbol references to work without being annoying we'd also want project symbol renames to refactor all existing references. (I think this is already on the to-do list.) Remember to check for overlapping user labels.

BacchusFLT commented 2 years ago

Den tis 10 maj 2022 kl 02:34 skrev Andy McFadden @.***>:

Summing up the proposed changes to the Format Address Table feature:

Look for matches with platform/project symbols for external addresses. Show these in the preview, and generate symbolic references to them when the table is formatted.

Add another checkbox (yikes!) that causes new project symbols ("ETxxxx") to be generated for external addresses that don't have an associated symbol.

Show a warning (probably in the label preview window) when the labels we're about to generate would end up hidden inside an existing multi-byte data area. Maybe don't generate those labels?

Need to explore how this interacts with the "tag as code start point" feature.

The last one would mean that if I have defined "ColourMemory" at $d800, and being 1000 bytes big, you would warn if I wanted to have a reference to that 1000 byte space? I would surely want those labels in the current case. In a game there would be LOADS of cases where I see a point in defining the screen memory as something and then when the program updates the content of the screen, still want these references. I basically fail to see why not.

BacchusFLT commented 2 years ago

the OK button is there but pressing OK nothing happens

It looks like you sent an image but github ate it... I think you need to drag & drop images into the issue reporting mechanism to include them, e-mailing them doesn't work. However, I think I see the problem.

Bummer - restrictions on emails being sent so you have to use the GUI.

There's a button in the bottom-right corner of the SourceGen main window that says something like "40 messages (40 warning/error)". Click that if you're not already seeing the message window. In my copy of your project, the first few messages tell you that the labels you've generated are hidden and being ignored.

You were right - references to a block in which there are auto generated labels will make the labels suppressed and a warning presented. A bit confusing. But thanks!

Suggestion: The size of the error segment seems fixed. It would be desirable to be able to select size.

The other warnings are mostly about formatting that appears in the middle of instructions. It looks like some things that used to be formatted as single-byte data items turned into code. You can clean these up by double-clicking the warning and then using Actions > Remove Formatting on the line in question. (It also removes hidden labels, so don't use it on $0a00 or you'll clear the labels embedded in the .ds ... https://6502bench.com/sgmanual/mainwin.html#remove-formatting .)

Indeed also correct. The program in question has structures for menus, and they contain pointers. Depending on a magic byte these could be code or another structure. I erroneously defined one as a structure where it should have been code. Thanks!

fadden commented 2 years ago

The last one would mean that if I have defined "ColourMemory" at $d800, and being 1000 bytes big, you would warn if I wanted to have a reference to that 1000 byte space?

If you defined a project/platform symbol for an external address at $d800 with width 1000, the formatter would generate an appropriate symbolic reference (e.g. "ColourMemory+40"). If the program itself spanned $d800 and you created a .bulk that covered 1000 bytes, you'd get a warning, because you can't tell the assembler that you have a single 1000-byte item and also define a label in the middle of it. (The alternative would be to have the address formatter try to split up the .bulk declaration, but that gets messy in a hurry, e.g. what if the target is the middle of a 3-byte instruction.) If the area is covered by an auto-generated .string or .fill, rather than an explicit value, the byte scanner will automatically split the region around the label, like it does when you change the area at $0a00 back to Default format.

SourceGen tries really hard to prevent you from doing confusing or conflicting things, like formatting only the middle byte of an instruction or putting a label on an address in the middle of a string, but there are ways to do it. (Even if the UI were bulletproof, the project files are JSON and can be edited by hand.) The philosophy for dealing with these situations is to leave them in place but warn you that something you told SourceGen to do is being modified or ignored.

Suggestion: The size of the error segment seems fixed. It would be desirable to be able to select size.

I don't understand what you mean. Are you referring to the window size?

BacchusFLT commented 2 years ago

Den tis 10 maj 2022 kl 17:02 skrev Andy McFadden @.***>:

The last one would mean that if I have defined "ColourMemory" at $d800, and being 1000 bytes big, you would warn if I wanted to have a reference to that 1000 byte space?

If you defined a project/platform symbol for an external address at $d800 with width 1000, the formatter would generate an appropriate symbolic reference (e.g. "ColourMemory+40"). If the program itself spanned $d800 and you created a .bulk that covered 1000 bytes, you'd get a warning, because you can't tell the assembler that you have a single 1000-byte item and also define a label in the middle of it. (The alternative would be to have the address formatter try to split up the .bulk declaration, but that gets messy in a hurry, e.g. what if the target is the middle of a 3-byte instruction.) If the area is covered by an auto-generated .string or .fill, rather than an explicit value, the byte scanner will automatically split the region around the label, like it does when you change the area at $0a00 back to Default format.

I get a sense of the complexity, but I don't really understand that splitting a string would be any different than splitting a bulk segment. A code instruction cannot be spilt - agree there. For self modifying code it has to be address+1 and address+2 to capture that, but this is also something you support. So as I understand it, you do graceful splitting in almost every case. Just not bulk. And I don't understand the differnece.

SourceGen tries really hard to prevent you from doing confusing or conflicting things, like formatting only the middle byte of an instruction or putting a label on an address in the middle of a string, but there are ways to do it. (Even if the UI were bulletproof, the project files are JSON and can be edited by hand.) The philosophy for dealing with these situations is to leave them in place but warn you that something you told SourceGen to do is being modified or ignored.

Poking in the middle of a string is something I see a lot of. Like a string that contains the cursor position or a setting. Some people poke that directly to the string storage and then have a general copy routine of that string to the area where it's displayed. So labels inside strings and other data structures is a fully valid case for me. Suggestion: The size of the error segment seems fixed. It would be

desirable to be able to select size.

I don't understand what you mean. Are you referring to the window size?

Yes. If I wanted an overview of the 35 issues I had, I could see five of them as the view is static. I can view them by scrolling that window, but I cannot dynamically see the delimiter between the code window and the messages windows, as I can with all the other windows. References and notes is for example fully dynamic.

fadden commented 2 years ago

I get a sense of the complexity, but I don't really understand that splitting a string would be any different than splitting a bulk segment.

Suppose SourceGen loads a file that says there is a formatted string at $1000 that is 50 bytes long, and also says that there's a label at $1010. SourceGen can:

Ignore the label, and format the string like it was told to do.
Ignore the string directive, and output the label like it was told to do.
Do fancy things to resolve it into two separate strings. It was not told to do this.

My approach has been to do 1 or 2, and let you resolve the conflict, rather than try to resolve it in a way that you might not expect. Conflicts can become arbitrarily complex -- imagine there are 3 or 4 overlapping directives -- and the odds of SourceGen doing something undesirable are fairly high for all but the simplest cases.

The "graceful splitting" mechanism is performed by the editor at the time the edit is made. If you highlight 50 bytes with a label in the middle and open the data operand editor, at the top it will say "50 bytes in two regions". When you format it as a bulk region, it splits it at the label, generating two independent .bulk directives.

(When a region is set to default format, the data scanner looks for runs of identical bytes and character strings, and generates .fill/.str pseudo-ops. These are discarded and regenerated when you add or remove a label. This is why setting the bytes in the region at $0a00 to "default" format allows the labels to appear... the data analyzer re-scans everything, and breaks things up when labels are found. The auto-generated formatting isn't even stored in the project file.)

Poking in the middle of a string is something I see a lot of.

In the normal case, you want a single string without a label in the middle, e.g.:

          lda  MyString
          lda  MyString+8
MyString  .str "this is a nice string"

If you want to provide an explicit label to the assembler, the string has to be split into separate lines:

          lda  MyString
          lda  SubString
MyString  .str "this is "
SubString .str "a nice string"

If the project file has both the string format and the label, SourceGen can't know which was wanted, so when it sees a label in the middle of a string it tells you there's a conflict and asks you to resolve it. The default behavior is to ignore the label, because that's nicer than ignoring the string directive.

In source code you could write something like SubString = MyString+8, giving you a nice symbol without breaking up the string. SourceGen doesn't provide a way to define arbitrary equates.

If I wanted an overview of the 35 issues I had, I could see five of them as the view is static.

I can't remember if there was an issue with the layout that caused me to omit the resize or if I just didn't bother. There's an argument for splitting this into a stand-alone window with "fix it" buttons that remove formatting and hidden labels. (I think it's too cramped for buttons in its current layout unless your monitor is fairly wide.) In any event, the goal should be to get to zero warnings so the window disappears entirely.

Update: according to comments in MainWindow.xaml, I was having trouble making the WPF UI work correctly. The splitter behaved strangely when the window was hidden and un-hidden.

BacchusFLT commented 2 years ago

Thanks! A lot clearer.

Is there a way for me to know the discarded labels?

Den tis 10 maj 2022 kl 20:19 skrev Andy McFadden @.***>:

I get a sense of the complexity, but I don't really understand that splitting a string would be any different than splitting a bulk segment.

Suppose SourceGen loads a file that says there is a formatted string at $1000 that is 50 bytes long, and also says that there's a label at $1010. SourceGen can:

Ignore the label, and format the string like it was told to do.

Ignore the string directive, and output the label like it was told to do.

Do fancy things to resolve it into two separate strings. It was not told to do this.

My approach has been to do 1 or 2, and let you resolve the conflict, rather than try to resolve it in a way that you might not expect. Conflicts can become arbitrarily complex -- imagine there are 3 or 4 overlapping directives -- and the odds of SourceGen doing something undesirable are fairly high for all but the simplest cases.

The "graceful splitting" mechanism is performed by the editor at the time the edit is made. If you highlight 50 bytes with a label in the middle and open the data operand editor, at the top it will say "50 bytes in two regions". When you format it as a bulk region, it splits it at the label, generating two independent .bulk directives.

(When a region is set to default format, the data scanner looks for runs of identical bytes and character strings, and generates .fill/.str pseudo-ops. These are discarded and regenerated when you add or remove a label. This is why setting the bytes in the region at $0a00 to "default" format allows the labels to appear... the data analyzer re-scans everything, and breaks things up when labels are found. The auto-generated formatting isn't even stored in the project file.)

Poking in the middle of a string is something I see a lot of.

In the normal case, you want a single string without a label in the middle, e.g.:
      lda  MyString
      lda  MyString+8
MyString .str "this is a nice string"

If you want to provide an explicit label to the assembler, the string has to be split into separate lines:
      lda  MyString
      lda  SubString
MyString .str "this is " SubString .str "a nice string"

If the project file has both the string format and the label, SourceGen can't know which was wanted, so when it sees a label in the middle of a string it tells you there's a conflict and asks you to resolve it. The default behavior is to ignore the label, because that's nicer than ignoring the string directive.

In source code you could write something like SubString = MyString+8, giving you a nice symbol without breaking up the string. SourceGen doesn't provide a way to define arbitrary equates.

If I wanted an overview of the 35 issues I had, I could see five of them as the view is static.

I can't remember if there was an issue with the layout that caused me to omit the resize or if I just didn't bother. There's an argument for splitting this into a stand-alone window with "fix it" buttons that remove formatting and hidden labels. (I think it's too cramped for buttons in its current layout unless your monitor is fairly wide.) In any event, the goal should be to get to zero warnings so the window disappears entirely.

— Reply to this email directly, view it on GitHub https://github.com/fadden/6502bench/issues/130#issuecomment-1122721783, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGZWZSV22QXPKJEPDDMURRTVJKSE7ANCNFSM5VKIKDBA . You are receiving this because you authored the thread.Message ID: @.***>

fadden commented 2 years ago

Is there a way for me to know the discarded labels?

For hidden labels, the symbol is shown in the "Context" column of the warning message.

fadden commented 2 years ago

Action items moved to https://github.com/fadden/6502bench/wiki/TO-DO-List .