Open the-soloist opened 1 month ago
I think we can avoid walking the local database directory again here in the first place instead. When finding a match in the local libc-database, we know the id and thus the filename of the libc we want to return. Maybe allow the id
to be searched in search_by_hash
and special case it in the local_database provider.
I agree that handling id
separately within the providers is a good approach, it allows the use of libcdb's caching feature. However, this will cause some variable name to lose its original meaning (it's not hash type). I've tried writing some code, could you give me some suggestions?
I'm not sure I like hash_type="id"
(maybe hash_type="filename"
would be better?). I think the build ID should be the default, it should just be parsed quicker, maybe we can have a separate function for extracting build id (at C speed ideally), but come on, reading only the first page of a file should be quicker than reading all of it, especially on HDDs; also, build-id does not change if you strip/unstrip or move the file around. If our ELF implementation is a bottleneck, we can resort to implementing separate functionality just for turbofast build-id extraction.
While using
search_by_symbol_offsets
, I found that the search speed forbuild_id
was significantly slower compared to other hash types.The reason for this is that ELF loads too many things. I attempted to replace it with
ELFFile
, which noticeably improved the speed, but it introduced redundant functionality. I couldn't think of a simple way to implement it, so I added ahash_type
parameter tosearch_by_symbol_offsets
, with a default setting ofmd5
to speed upsearch_by_symbol_offsets
, and provide users with a controllable option.I'm testing on the following code:
and found another question https://github.com/Gallopsled/pwntools/issues/2414