Closed jreineckearm closed 6 months ago
@jreineckearm, shouldn't each address cover one word, so that the current behavior is correct? Since you've changed the bytes per word, but not the words/group or groups/row values, the number of words/row hasn't changed, and the address difference between rows should be the same.
@colin-grant-work , interesting. I think we are hitting here a CPU architecture terminology problem challenge. :-)
In general, the Arm architecture (yes, I may be a little biased :-) ) works with byte-addressable memory. But uses the term word
for example for the M-class flavors as a 32-bit unit. See for example: https://developer.arm.com/documentation/ddi0419/c/Application-Level-Architecture/ARM-Architecture-Memory-Model/Address-space
I realize only now that other CPU architectures can be word-addressable. This is may become more than just a quick calculation fix and require a little more thinking....
Yes... it's not very nice. I was thinking through some similar issues recently and found this sentence:
A byte is composed of a contiguous sequence of bits, the number of which is implementation-defined.
So we can't even technically use the term 'byte' with any definite meaning!
So we can't even technically use the term 'byte' with any definite meaning!
Hm....I am sure we can come up with a good story around it. At the end of the day, each memory window instance is used in the context of a specific CPU architecture. But we may need to carefully think about how to feed in such a-priori-knowledge:
A DAP should know how to interpret an address based on the targeted CPU. So, it should be able to return the right amount of bits based on address
and length
(sorry, if wrong, don't have the MS DAP spec at hand). We just may need more meta-information about to how many bits/digits a byte
refers. And if a CPU is byte- or word-addressable. All of that without the need to make a user an architecture expert to configure the window correctly for their targeted HW.
Let me think this a little through over the weekend (it's getting late over here in Germany). But I believe we are not that far off with what we have already now...
Just curious, @colin-grant-work , do we hit the 1 byte != 8 bit
problem with your use cases already? Just curious where we can take shortcuts for now and tackle the additional complexity when the demand is there.
Just curious, @colin-grant-work , do we hit the 1 byte != 8 bit problem with your use cases already?
No, byte != 8 bits
hasn't come up (thankfully), but word-addressability is important to us. Though I'm not sure that those aren't just paraphrasings of the same problem.
No, byte != 8 bits hasn't come up (thankfully), but word-addressability is important to us. Though I'm not sure that those aren't just paraphrasings of the same problem.
OK, thanks! Let me digest this a little. But I am confident we'll find a solution for this.
@colin-grant-work , @thegecko , @planger , could one of you please remove the bug
label? I think this is no longer a defect but an enhancement.
This probably fits together with the issues raised in #77, as well.
This probably fits together with the issues raised in #77, as well.
Yes, it potentially does.
Thinking this further through, you are absolutely right with this statement:
Though I'm not sure that those aren't just paraphrasings of the same problem.
I believe what we are really facing is a "just" a terminology problem. And we may get this solved by "just" updating option names and descriptions. This would then be closer to the Theia Memory Inspector options which I now understand better by this exercise here. The implemented logic wouldn't need to change.
The problem we tried to solve with adding the "Bytes per Word" option was to express what, I believe to understand now, the group
was supposed to express. In terms of an M-profile Arm CPU, the group
would be one of BYTE
(8-bit), H(alf)WORD
(16-bit), WORD
(32-bit). Other Arm architecture profiles have comparable terminology and include 64-bit representations.
[Action] What we effectively need to change is the introduced "Bytes per Word" to a "Bytes per Unit" (or similar). Where "Unit" is the minimum addressable unit. Open for suggestions for a better name.
[Action] "Words per Group" would change to "Units per Group". Though we may want to rename "Group" to something expressing the above relation more easily understandable for users of our existing debug tools. No good idea yet, but I'll keep thinking about this. Again, open for suggestions.
The rest would stay the same.
We could consider additionally to change the "Bytes per to Unit" to a "Bits per Unit". But how realistic would it be today to that there are units that are not a multiple of 8 Bits? My worry really is the involved effort to make that change and the resulting testing.
What do you think?
@colin-grant-work , I spent a bit more time thinking about this topic and discussed this with others internally. And I still believe that this is "only" a terminology problem.
We have the following options (I leave Groups per Row
out of the equation, it's not relevant for the problem):
Bytes per Word
: A size in Bytes of the finest granular data unit in memory. The number of bytes returned for a single address. Words per Group
: Number of above word
s combined into a value. Group
s are separated by spaces. Endianess affects the order of those word
s in the group
display.For byte-addressable machines, a word
as per above terminology is always 1 byte
.
For word-addressable machines, a word
as per above terminology is a value greater than 1 byte
. Unless an architecture specifies the meaning of word
to be one byte.
The term word
can have various meanings across architectures. In the Arm architecture, it for example means the natural data word size for CPU instructions. For Arm M-profile CPUs (32-bit), it is 4 bytes. However, there also exist instructions working on other sized data units like bytes or half-words. Which is possible because of the byte-addressable character.
Due to the ambiguity that I see for the term word
, I don't think we can use it in our displays/documentation except from descriptions of specific examples and how they map to what we have.
I have explored various alternatives for word
:
unit
: IMO too generic and not meaningful enough.addressable unit
: A little better, but still leaves room for interpretation.memory cell
: Sounds a little too "physical" for SW developers which is our main user base.minimum addressable unit
: IMO the most scalable, yet still specific term I can think of. And it covers the address aspect. I've seen it being used in software APIs for the very same purpose before. Though the name is a bit lengthy for option names.MAU
: Short form of the above.I believe MAU
could work with appropriate tooltips, in-tool descriptions, and documentation. It would leave us with renamed options:
Bytes per MAU
: Number of bytes of a Minimum Addressable Unit in memory. Always 1
for byte-addressable architectures. Values greater than 1
for word-addressable architectures.MAUs per Group
: Number of MAU
s to form a value representation. Displayed group
s are separated by spaces. Endianess affects the order of MAU
s within a displayed group
.Is this something that would work for your use cases as well? I sense there is more to discuss from this comment: https://github.com/eclipse-cdt-cloud/vscode-memory-inspector/pull/108#discussion_r1525557954
The resulting actions to make this change are:
word
with mau
and similar.@colin-grant-work , @planger , thanks again for joining today's call! This was a useful discussion to agree on a couple of topics. Below, the essence of the call and decisions. Please correct or amend if I missed to write something down. And apologies if this is partly repeating information from above. I'll update other PRs and Issues accordingly.
byte
as a data unit of 8 bit. We revisit once we hit use cases where this assumption no longer holds.word
with Minimum Addressable Unit (MAU)
. This helps to abstract away the meaning of word
which can mean different things across CPU/memory architectures.
group
for a grouped number of MAUs. While the term group
is rather generic, it is a means to transform a series of MAUs into a hexadecimal value that considers the selected endianess.
group
level.group
s. Further investigation on enhancing in future with an "edit-by-selection" in the sense of selecting a range via mouse and turn the selection into an in-place edit.
Resulting actions
word
with MAU
(and mau
, respectively) in both user visible text (options) and source code. Also add tooltips/descriptions to clarify the meaning of MAU
within the tool.
README
with a terminology section to clarify the most important terms we agreed on: byte
, mau
, group
. This shall be complemented by a screenshot which highlights each of the three in a specific setup.
edit-by-selection
approach. Probably best to capture in a separate issue. See https://github.com/eclipse-cdt-cloud/vscode-memory-inspector/pull/110group
with a more meaningful value display selection. See https://github.com/eclipse-cdt-cloud/vscode-memory-inspector/issues/114We agreed on making a new Memory Inspector release after items 1. - 3. have been addressed (FYI for @thegecko )
Description
The recently introducedBytes per Word
setting isn't considered when calculating the values for the address column. This causes incorrect addresses to be displayed.UPDATE: What was initially considered as a bug showed the need to consider byte-addressable memory architectures that use the term
word
for a data unit consisting of multiple directly addressable bytes. We need to ensure thatThe solution may require the introduction of a new configuration option.
Below the original "bug reproducer" to allow understanding of the evolution of this Github issue.
How to reproduce:
Expected behavior
The addresses should be incremented by the effective number of Bytes displayed per previous row. At the moment, it looks like the "Bytes per Word" is not taken into account.
Environment
Additional information
Correct address values:
Address values not taking "Bytes per Word" into account: