andymeneely / attack-surface-metrics

Scripts for collecting metrics of the attack surface
MIT License
14 stars 4 forks source link

Develop UID scheme for functions in the call graph #39

Closed nuthanmunaiah closed 9 years ago

nuthanmunaiah commented 9 years ago

We have to develop a scheme to uniquely identify functions in the call graph. There are functions with the same name in multiple files. For instance:

libavutil\bswap.h       bswap_32(uint32_t x)
libavutil\arm\bswap.h   bswap_32(uint32_t x)
libavutil\arm\bswap.h   bswap_32(uint32_t x)
libavutil\avr32\bswap.h bswap_32(uint32_t x)
libavutil\bfin\bswap.h  bswap_32(uint32_t x)
libavutil\sh4\bswap.h   bswap_32(uint32_t x)
libavutil\x86\bswap.h   bswap_32(uint32_t x)

We could hash the function name and the relative path of the file containing the function. A function would then be uniquely identified by its hash.

megakevin commented 9 years ago

The identity property of the Call class is what we have been using to uniquely identify functions. This property is only a concatenation of the name of the function and the name of the file where that function is defined. I think it would be great to add the path to the file into the mix.

However, in the past, me and the professor have discussed the implementation of a hash such as the one you suggest. Hashes however, as he pointed out, are only theoretically unique... Meaning there's a chance that we could run into collisions. That's why we have been just using the string as the unique identifier.

andymeneely commented 9 years ago

Those files should be considered different. We should be at least concatenating the full path to distinguish those nodes.