Open mahaloz opened 2 months ago
if you don't want to have to use the edm_t.cmt
and udm_t.cmt
attributes to enumerate or serialize complex field comments, you can also unpack/save them from the result of tinfo_t.serialize()
..which was the pre-8.4 method anyways ("fields" are similar).
decoding the bytes returned by tinfo_t.serialize
into a list of comments is basically consuming a byte, determine whether it's an 8-bit/16-bit length, decoding said length, using the length to extract the comment, then utf-8 decoding those bytes and repeating until done.
def decode_bytes(bytes):
'''Decode the given `bytes` into a list containing the length and the bytes for each encoded string.'''
ok, results, iterable = True, [], (ord for ord in bytearray(bytes))
integer = next(iterable, None)
length_plus_one, ok = integer or 0, False if integer is None else True
while ok:
one = 1 if length_plus_one < 0x7f else next(iterable, None)
assert((one == 1) and length_plus_one > 0)
encoded = bytearray(ord for index, ord in zip(builtins.range(length_plus_one - 1), iterable)) # using zip to clamp bytes consumed
results.append((length_plus_one - 1, encoded)) if ok else None
integer = next(iterable, None)
length_plus_one, ok = integer or 0, False if integer is None else True
return results
encoding the string passed to tinfo_t.deserialize(til, type, fields, cmts=None)
requires encoding the length for each utf-8 encoded comment, and concatenating back into a stream of bytes.
apologies for the unreadability of the following.. "encode_length
" is all that is relevant
def encode_bytes(cls, strings):
'''Encode the list of `strings` with their lengths and return them as bytes.'''
encode_length = lambda integer: bytearray([integer + 1] if integer + 1 < 0x80 else [integer + 1, 1])
iterable = (bytes(string) if isinstance(string, (bytes, bytearray)) else string.encode('utf-8') for string in strings)
pairs = ((len(chunk), chunk) for chunk in iterable)
return bytes(bytearray().join(itertools.chain(*((encode_length(length), bytearray(chunk)) for length, chunk in pairs))))
however, it's worth confirming the performance with regards to serializing/deserializing them at scale is actually relevant in binsync. minsc creates an index for all commentable "things" so that they can be tagged for searching and (mis-)used to store nearly-arbitrary data, so being able to check if a tinfo_t
even has comments or distinguishing what exactly was updated (name/comment/other) in response to events (w/o having to iterate through all the fields one-by-one) made a difference.
...i'm literally praying that they don't try to retrofit repeatable/non-repeatable comments into this btw.
Background
In most decompilers, like IDA Pro, you can have types that have comments in them, like:
Which libbs does not currently support. An ideal solution would look like this:
Implementation
To support this type of commenting, we'll need to do a few things:
comment
attributeFunction
to support comments