gaasedelen / lighthouse

A Coverage Explorer for Reverse Engineers
MIT License
2.23k stars 308 forks source link

Module name matching is too fuzzy #63

Closed taviso closed 4 years ago

taviso commented 5 years ago

I've been trying out lighthouse for coverage browsing, it's very polished, thanks!

I ran into one issue, the module name matching is too relaxed. If I have coverage for module "foo", but a module is listed earlier in the list called "foobar", then it matches and lighthouse claims there is no coverage.

I believe this is because you search for modules with fuzzy matching by default like this:

            # attempt lookup using case-insensitive filename
            for module in self.modules:
                if module_name.lower() in module.filename.lower():
                    return module

I manually edited my coverage file so that unrelated modules didn't contain my module as a substring, and then it worked perfectly.

>>> "foo".lower() in "foobar".lower()
True

I think fuzzy should be the fallback, not the default, WDYT?

gaasedelen commented 5 years ago

Hi Tavis! Thanks for the kind words.

I agree that this is an issue. The fuzzy lookup flow is admittedly... pretty cheap. I am kind of surprised it has lasted this long without someone raising your example.

I have been meaning to improve this code, but also add a bailout dialog to list the modules for human selection. A lot of the coverage loading code is starting to get retooled on the develop branch, so this is probably a good time to address it.

In the meantime, I would recommend you tweak the fuzzy function logic to serve your needs. I'll close this issue once my changes make their way onto dev, maybe in the next week or two.

cregnec commented 5 years ago

Hi, I'm trying to use lighthouse with dynamorio. The result is pretty cool.

But I had some problem with the get_module too. when there are several segments, dynamorio generates multiple modules with the same name.

Columns: id, containing_id, start, end, entry, offset, path

2,   2, 0x00007f5d4b26b000, 0x00007f5d4b278000, 0x00007f5d4b278170, 0000000000000000, module
3,   2, 0x00007f5d4b278000, 0x00007f5d4b29b000, 0x00007f5d4b278170, 000000000000d000, module
4,   2, 0x00007f5d4b29b000, 0x00007f5d4b2a8000, 0x00007f5d4b278170, 0000000000030000, module
5,   2, 0x00007f5d4b2a8000, 0x00007f5d4b2ea000, 0x00007f5d4b278170, 000000000003c8a8, module

currently, only one module is returned:

module = self.get_module(module_name)

It should probably return all matching modules and then merge their IDs.

gaasedelen commented 4 years ago

Hi, I'm trying to use lighthouse with dynamorio. The result is pretty cool.

But I had some problem with the get_module too. when there are several segments, dynamorio generates multiple modules with the same name.

Columns: id, containing_id, start, end, entry, offset, path

2,   2, 0x00007f5d4b26b000, 0x00007f5d4b278000, 0x00007f5d4b278170, 0000000000000000, module
3,   2, 0x00007f5d4b278000, 0x00007f5d4b29b000, 0x00007f5d4b278170, 000000000000d000, module
4,   2, 0x00007f5d4b29b000, 0x00007f5d4b2a8000, 0x00007f5d4b278170, 0000000000030000, module
5,   2, 0x00007f5d4b2a8000, 0x00007f5d4b2ea000, 0x00007f5d4b278170, 000000000003c8a8, module

currently, only one module is returned:

module = self.get_module(module_name)

It should probably return all matching modules and then merge their IDs.

Sorry for the late reply, I am only now catching up on lighthouse maintenance 😬.

The drcov module splitting is unrelated to the main issue reported by taviso, but I just pushed a fix for it in commit f6902baf387ef2dac5fac4c26fe494f1414c3f36. The lighthouse development branch has a bunch of other updates & fixes, I would recommend updating to it.

Regarding the main issue around fuzzy name matching, it is the next item on my agenda. I'll have something on the dev branch in the next 24 hours.

gaasedelen commented 4 years ago

Long overdue, but there is now a fallback dialog that will pop-up if lighthouse cannot automatically match your database/binary name to the ones in a coverage file:

image

This is available on the development branch right now.

I think the fuzzy-name lookup was tweaked to be stricter when this issue first came in, but I will double check and make sure before the upcoming release.