eclipse-cdt-cloud / vscode-memory-inspector

vscode memory inspector
https://open-vsx.org/extension/eclipse-cdt/memory-inspector
Eclipse Public License 2.0
6 stars 10 forks source link

Byte-addressable vs word-addressable memory architectures and terminology #100

Closed jreineckearm closed 6 months ago

jreineckearm commented 7 months ago

Description

The recently introduced Bytes per Word setting isn't considered when calculating the values for the address column. This causes incorrect addresses to be displayed.

UPDATE: What was initially considered as a bug showed the need to consider byte-addressable memory architectures that use the term word for a data unit consisting of multiple directly addressable bytes. We need to ensure that

The solution may require the introduction of a new configuration option.

Below the original "bug reproducer" to allow understanding of the evolution of this Github issue.

How to reproduce:

Expected behavior

The addresses should be incremented by the effective number of Bytes displayed per previous row. At the moment, it looks like the "Bytes per Word" is not taken into account.

Environment

Additional information

Correct address values: image

Address values not taking "Bytes per Word" into account: image

colin-grant-work commented 7 months ago

@jreineckearm, shouldn't each address cover one word, so that the current behavior is correct? Since you've changed the bytes per word, but not the words/group or groups/row values, the number of words/row hasn't changed, and the address difference between rows should be the same.

jreineckearm commented 7 months ago

@colin-grant-work , interesting. I think we are hitting here a CPU architecture terminology problem challenge. :-) In general, the Arm architecture (yes, I may be a little biased :-) ) works with byte-addressable memory. But uses the term word for example for the M-class flavors as a 32-bit unit. See for example: https://developer.arm.com/documentation/ddi0419/c/Application-Level-Architecture/ARM-Architecture-Memory-Model/Address-space I realize only now that other CPU architectures can be word-addressable. This is may become more than just a quick calculation fix and require a little more thinking....

colin-grant-work commented 7 months ago

Yes... it's not very nice. I was thinking through some similar issues recently and found this sentence:

A byte is composed of a contiguous sequence of bits, the number of which is implementation-defined.

So we can't even technically use the term 'byte' with any definite meaning!

jreineckearm commented 7 months ago

So we can't even technically use the term 'byte' with any definite meaning!

Hm....I am sure we can come up with a good story around it. At the end of the day, each memory window instance is used in the context of a specific CPU architecture. But we may need to carefully think about how to feed in such a-priori-knowledge: A DAP should know how to interpret an address based on the targeted CPU. So, it should be able to return the right amount of bits based on address and length (sorry, if wrong, don't have the MS DAP spec at hand). We just may need more meta-information about to how many bits/digits a byte refers. And if a CPU is byte- or word-addressable. All of that without the need to make a user an architecture expert to configure the window correctly for their targeted HW. Let me think this a little through over the weekend (it's getting late over here in Germany). But I believe we are not that far off with what we have already now...

jreineckearm commented 7 months ago

Just curious, @colin-grant-work , do we hit the 1 byte != 8 bit problem with your use cases already? Just curious where we can take shortcuts for now and tackle the additional complexity when the demand is there.

colin-grant-work commented 7 months ago

Just curious, @colin-grant-work , do we hit the 1 byte != 8 bit problem with your use cases already?

No, byte != 8 bits hasn't come up (thankfully), but word-addressability is important to us. Though I'm not sure that those aren't just paraphrasings of the same problem.

jreineckearm commented 7 months ago

No, byte != 8 bits hasn't come up (thankfully), but word-addressability is important to us. Though I'm not sure that those aren't just paraphrasings of the same problem.

OK, thanks! Let me digest this a little. But I am confident we'll find a solution for this.

@colin-grant-work , @thegecko , @planger , could one of you please remove the bug label? I think this is no longer a defect but an enhancement.

colin-grant-work commented 7 months ago

This probably fits together with the issues raised in #77, as well.

jreineckearm commented 7 months ago

This probably fits together with the issues raised in #77, as well.

Yes, it potentially does.

Thinking this further through, you are absolutely right with this statement:

Though I'm not sure that those aren't just paraphrasings of the same problem.

I believe what we are really facing is a "just" a terminology problem. And we may get this solved by "just" updating option names and descriptions. This would then be closer to the Theia Memory Inspector options which I now understand better by this exercise here. The implemented logic wouldn't need to change.

The problem we tried to solve with adding the "Bytes per Word" option was to express what, I believe to understand now, the group was supposed to express. In terms of an M-profile Arm CPU, the group would be one of BYTE (8-bit), H(alf)WORD (16-bit), WORD (32-bit). Other Arm architecture profiles have comparable terminology and include 64-bit representations.

[Action] What we effectively need to change is the introduced "Bytes per Word" to a "Bytes per Unit" (or similar). Where "Unit" is the minimum addressable unit. Open for suggestions for a better name.

[Action] "Words per Group" would change to "Units per Group". Though we may want to rename "Group" to something expressing the above relation more easily understandable for users of our existing debug tools. No good idea yet, but I'll keep thinking about this. Again, open for suggestions.

The rest would stay the same.

We could consider additionally to change the "Bytes per to Unit" to a "Bits per Unit". But how realistic would it be today to that there are units that are not a multiple of 8 Bits? My worry really is the involved effort to make that change and the resulting testing.

What do you think?

jreineckearm commented 7 months ago

@colin-grant-work , I spent a bit more time thinking about this topic and discussed this with others internally. And I still believe that this is "only" a terminology problem.

Current situation

We have the following options (I leave Groups per Row out of the equation, it's not relevant for the problem):

For byte-addressable machines, a word as per above terminology is always 1 byte. For word-addressable machines, a word as per above terminology is a value greater than 1 byte. Unless an architecture specifies the meaning of word to be one byte.

The term word can have various meanings across architectures. In the Arm architecture, it for example means the natural data word size for CPU instructions. For Arm M-profile CPUs (32-bit), it is 4 bytes. However, there also exist instructions working on other sized data units like bytes or half-words. Which is possible because of the byte-addressable character.

Due to the ambiguity that I see for the term word, I don't think we can use it in our displays/documentation except from descriptions of specific examples and how they map to what we have.

Proposal

I have explored various alternatives for word:

I believe MAU could work with appropriate tooltips, in-tool descriptions, and documentation. It would leave us with renamed options:

Is this something that would work for your use cases as well? I sense there is more to discuss from this comment: https://github.com/eclipse-cdt-cloud/vscode-memory-inspector/pull/108#discussion_r1525557954

The resulting actions to make this change are:

jreineckearm commented 7 months ago

@colin-grant-work , @planger , thanks again for joining today's call! This was a useful discussion to agree on a couple of topics. Below, the essence of the call and decisions. Please correct or amend if I missed to write something down. And apologies if this is partly repeating information from above. I'll update other PRs and Issues accordingly.

Resulting actions

We agreed on making a new Memory Inspector release after items 1. - 3. have been addressed (FYI for @thegecko )