for-just-we / CppCodeAnalyzer

A tool based on python to parse C/C++ code into code property graph
13 stars 0 forks source link

About calleeinfos #5

Closed jeffyjeff2893 closed 1 year ago

jeffyjeff2893 commented 1 year ago

I'm currently looking to parse a dataset of real world functions into their cpg representations, this parser is a perfect fit but I'm currently stumped on how to handle calleeinfos with real world code. My dataset is comprised of stand alone functions which have calls to apis but I have no way of getting the actual definition of those functions that are called so I can only speculate as to what they do based on the name. For callleinfos I need to be able to tell whether the pointer parameters are argdef or arguse but without the definitions I cannot tell. Is there any way to mitigate this or do you have any experience using this tool with real world code? Any tips would be greatly appreciated, thank you.

for-just-we commented 1 year ago

I'm currently looking to parse a dataset of real world functions into their cpg representations, this parser is a perfect fit but I'm currently stumped on how to handle calleeinfos with real world code. My dataset is comprised of stand alone functions which have calls to apis but I have no way of getting the actual definition of those functions that are called so I can only speculate as to what they do based on the name. For callleinfos I need to be able to tell whether the pointer parameters are argdef or arguse but without the definitions I cannot tell. Is there any way to mitigate this or do you have any experience using this tool with real world code? Any tips would be greatly appreciated, thank you.

Are those functions without definitions from third-party libraries? If so, I have no way.

CalleeInfo currently contains information about some system libraries, but some rarely used system functions are not included and need to be manually added if needed. As for the information on third-party libraries, there is currently no good way to speculate which parameters may use or modify pointer information. Perhaps machine learning can alleviate this problem by predicting parameter information through function and parameter names.

Or If you could get the source code of the library, you could do a pre-analysis to determine whether the API re-defines pointer. But this could be expensive and only if you could get the source code.

jeffyjeff2893 commented 1 year ago

I see, thank you.