binref / refinery

High Octane Triage Analysis
Other
618 stars 62 forks source link

Extend NSItem Equality check #52

Closed Squiblydoo closed 1 week ago

Squiblydoo commented 1 week ago

This suggested improvement originated from a PR request for Debloat's parsing of NSIS files. In short, this executable (https://www.virustotal.com/gui/file/39d17f33b47863330d407ea0e4b2fe7b676c3708dd00ae65432ad21a1ceef997) cannot be parsed by xtnsis due to a duplicate item. The duplicate item is found based off of a duplicate offset. The error is raised because the 2nd item is a different python object even though all other elements are the same.

With extension of equality, no error is raised. Since the two objects are identical in every way, the items dictionary is going to keep the first object.

For more details see Debloat's PR here: https://github.com/Squiblydoo/debloat/pull/43 See error here: https://github.com/Squiblydoo/debloat/issues/42

codecov[bot] commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 87.24%. Comparing base (3966e53) to head (41de02d).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #52 +/- ## ======================================= Coverage 87.24% 87.24% ======================================= Files 342 342 Lines 30606 30605 -1 ======================================= Hits 26702 26702 + Misses 3904 3903 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jhhcs commented 1 week ago

Unfortunately, I don't like the proposed fix. While this makes the problem go away, the semantics don't seem right at all: We would be doctoring the equality of a certain type because in reality we do not want to compare these objects at all - we are only interested in two of their properties, namely path and offset. Hence, I think we really want to replace this code:

for item in self._read_items():
    if items.setdefault((item.path, item.offset), item) != item:
        raise ValueError(F'Two different items with the same position {item.offset} and the same path: {item.path}.')

with this code:

for item in self._read_items():
     items.setdefault((item.path, item.offset), item)

We can also remove the eq=True in the dataclass decorator because we do not need to compare these items at all.

In fact, there isn't even any need to store all the metadata I have on NSItem, but I did not know that when I wrote the code that parses it out and I decided to keep it just in case. Anyway, I think deduplicating by path and offset without any more checks is the correct fix for this.

If you would adjust your PR to reflect that change, I would absolutely merge it =).

Squiblydoo commented 1 week ago

Makes sense to me.