Open zz-fz-john opened 2 months ago
Would you be able to dig a bit and locate where the bug is?
Would you be able to dig a bit and locate where the bug is?
I think it’s because in the process of analyzing class members, all class members that are structures are regarded as their parent classes.
For example the above code:
%class.AP_GPS_Backend = type <{ ptr, ptr, ptr, ptr, i64, %class.JitterCorrection, i32, [4 x i8], i64, i32, i32, i32, i32, i16, [6 x i8] }>
In fact, JitterCorrection is a member variable in AP_GPS_Backend, but svf mistakenly recognized it as the parent class of AP_GPS_Backend.
I read some code, and I think maybe it's because of the readInheritanceMetadataFromModule(const Module &M) function in CHGBuilder.cpp, which iterates through all the operands and treats the operands as his parent class. here is the code:
void CHGBuilder::readInheritanceMetadataFromModule(const Module &M)
{
for (Module::const_named_metadata_iterator mdit = M.named_metadata_begin(),
mdeit = M.named_metadata_end(); mdit != mdeit; ++mdit)
{
const NamedMDNode *md = &*mdit;
string mdname = md->getName().str();
if (mdname.compare(0, 15, "__cxx_bases_of_") != 0)
continue;
string className = mdname.substr(15);
for (NamedMDNode::const_op_iterator opit = md->op_begin(),
opeit = md->op_end(); opit != opeit; ++opit)
{
const MDNode *N = *opit;
const MDString* mdstr = SVFUtil::cast<MDString>(N->getOperand(0).get());
string baseName = mdstr->getString().str();
chg->addEdge(className, baseName, CHEdge::INHERITANCE);
}
}
}
But I have a little doubt: I queried the entire IR and did not find "__cxx_basesof".
But I have one question: I queried the entire IR and did not find "__cxx_basesof". Is the "__cxx_basesof" prefix generated by dynamic analysis during processing? Or does this prefix always exist in the IR file? If it is the former, then I think there is a problem with the way readInheritanceMetadataFromModule is processed. The processing logic of this function should be in the second for loop. When it encounters the first parameter that is not a structure, it will stop connecting the class inheritance graph. If so The latter, then I can't determine the problem.
CHG should not rely on metadata. We will need to identify the pattern from llvm bc alone.
I think I found the problem, the function "buildCHG()" will call the function "void CHGBuilder::buildCHGNodes(const Function F)", and then the function "void CHGBuilder::buildCHGNodes(const Function F)" will call " connectInheritEdgeViaCall" the function "connectInheritEdgeViaCall" 's code is here:
void CHGBuilder::connectInheritEdgeViaCall(const Function* caller, const CallBase* cs)
{
if (getCallee(cs) == nullptr)
return;
const Function* callee = getCallee(cs);
struct DemangledName dname = demangle(caller->getName().str());
if ((isConstructor(caller) && isConstructor(callee)) || (isDestructor(caller) && isDestructor(callee)))
{
if (cs->arg_size() < 1 || (cs->arg_size() < 2 && cs->paramHasAttr(0, llvm::Attribute::StructRet)))
return;
const Value* csThisPtr = cppUtil::getVCallThisPtr(cs);
//const Argument* consThisPtr = getConstructorThisPtr(caller);
//bool samePtr = isSameThisPtrInConstructor(consThisPtr, csThisPtr);
bool samePtrTrue = true;
if (csThisPtr != nullptr && samePtrTrue)
{
struct DemangledName basename = demangle(callee->getName().str());
if (!LLVMUtil::isCallSite(csThisPtr) &&
basename.className.size() > 0)
{
chg->addEdge(dname.className, basename.className, CHEdge::INHERITANCE);
}
}
}
}
This function will find the constructor or destructor of a class. If the constructor (destructor) of the class calls the constructor (destructor) of another class, the class will be treated as a subclass of the other class.
I think I found the problem, the function "buildCHG()" will call the function "void CHGBuilder::buildCHGNodes(const Function F)", and then the function "void CHGBuilder::buildCHGNodes(const Function F)" will call " connectInheritEdgeViaCall" the function "connectInheritEdgeViaCall" 's code is here:
void CHGBuilder::connectInheritEdgeViaCall(const Function* caller, const CallBase* cs) { if (getCallee(cs) == nullptr) return; const Function* callee = getCallee(cs); struct DemangledName dname = demangle(caller->getName().str()); if ((isConstructor(caller) && isConstructor(callee)) || (isDestructor(caller) && isDestructor(callee))) { if (cs->arg_size() < 1 || (cs->arg_size() < 2 && cs->paramHasAttr(0, llvm::Attribute::StructRet))) return; const Value* csThisPtr = cppUtil::getVCallThisPtr(cs); //const Argument* consThisPtr = getConstructorThisPtr(caller); //bool samePtr = isSameThisPtrInConstructor(consThisPtr, csThisPtr); bool samePtrTrue = true; if (csThisPtr != nullptr && samePtrTrue) { struct DemangledName basename = demangle(callee->getName().str()); if (!LLVMUtil::isCallSite(csThisPtr) && basename.className.size() > 0) { chg->addEdge(dname.className, basename.className, CHEdge::INHERITANCE); } } } }
This function will find the constructor or destructor of a class. If the constructor (destructor) of the class calls the constructor (destructor) of another class, the class will be treated as a subclass of the other class.
In fact, the situation where the constructor of one class calls the constructor of another class may be caused by two reasons: 1. The class is a subclass of another class. 2.An instance of another class is a member variable of that class.
I think I found the problem, the function "buildCHG()" will call the function "void CHGBuilder::buildCHGNodes(const Function F)", and then the function "void CHGBuilder::buildCHGNodes(const Function F)" will call " connectInheritEdgeViaCall" the function "connectInheritEdgeViaCall" 's code is here:
void CHGBuilder::connectInheritEdgeViaCall(const Function* caller, const CallBase* cs) { if (getCallee(cs) == nullptr) return; const Function* callee = getCallee(cs); struct DemangledName dname = demangle(caller->getName().str()); if ((isConstructor(caller) && isConstructor(callee)) || (isDestructor(caller) && isDestructor(callee))) { if (cs->arg_size() < 1 || (cs->arg_size() < 2 && cs->paramHasAttr(0, llvm::Attribute::StructRet))) return; const Value* csThisPtr = cppUtil::getVCallThisPtr(cs); //const Argument* consThisPtr = getConstructorThisPtr(caller); //bool samePtr = isSameThisPtrInConstructor(consThisPtr, csThisPtr); bool samePtrTrue = true; if (csThisPtr != nullptr && samePtrTrue) { struct DemangledName basename = demangle(callee->getName().str()); if (!LLVMUtil::isCallSite(csThisPtr) && basename.className.size() > 0) { chg->addEdge(dname.className, basename.className, CHEdge::INHERITANCE); } } } }
This function will find the constructor or destructor of a class. If the constructor (destructor) of the class calls the constructor (destructor) of another class, the class will be treated as a subclass of the other class.
In fact, the situation where the constructor of one class calls the constructor of another class may be caused by two reasons: 1. The class is a subclass of another class. 2.An instance of another class is a member variable of that class.
That is a good finding. Could we distinguish these two cases by changing the above code to fix it?
Unfortunately, in the above code, it should be difficult to distinguish between the two cases. there is a case to show that the two case is same in IR.
class AP_GPS_MAV : public AP_GPS_Backend {
public:
using AP_GPS_Backend::AP_GPS_Backend;
bool read() override;
static bool _detect(struct MAV_detect_state &state, uint8_t data);
void handle_msg(const mavlink_message_t &msg) override;
const char *name() const override { return "MAV"; }
private:
bool _new_data;
uint32_t first_week;
JitterCorrection jitter{2000};
};
#endif
the correspond IR is:
; Function Attrs: noinline nounwind optnone
define internal fastcc noundef ptr @_ZN10AP_GPS_MAVCI214AP_GPS_BackendER6AP_GPSRNS0_9GPS_StateEPN6AP_HAL10UARTDriverE(ptr noundef nonnull returned align 8 dereferenceable(128) %0, ptr noundef nonnull align 8 dereferenceable(928) %1, ptr noundef nonnull align 8 dereferenceable(176) %2, ptr noundef %3) unnamed_addr #5 comdat align 2 !dbg !210192 {
%5 = alloca ptr, align 4
%6 = alloca ptr, align 4
%7 = alloca ptr, align 4
%8 = alloca ptr, align 4
store ptr %0, ptr %5, align 4
call void @llvm.dbg.declare(metadata ptr %5, metadata !210197, metadata !DIExpression()), !dbg !210199
store ptr %1, ptr %6, align 4
call void @llvm.dbg.declare(metadata ptr %6, metadata !210200, metadata !DIExpression()), !dbg !210199
store ptr %2, ptr %7, align 4
call void @llvm.dbg.declare(metadata ptr %7, metadata !210201, metadata !DIExpression()), !dbg !210199
store ptr %3, ptr %8, align 4
call void @llvm.dbg.declare(metadata ptr %8, metadata !210202, metadata !DIExpression()), !dbg !210199
%9 = load ptr, ptr %5, align 4
%10 = load ptr, ptr %6, align 4, !dbg !210203
%11 = load ptr, ptr %7, align 4, !dbg !210203
%12 = load ptr, ptr %8, align 4, !dbg !210203
%13 = call fastcc noundef ptr @_ZN14AP_GPS_BackendC2ER6AP_GPSRNS0_9GPS_StateEPN6AP_HAL10UARTDriverE(ptr noundef nonnull align 8 dereferenceable(90) %9, ptr noundef nonnull align 8 dereferenceable(928) %10, ptr noundef nonnull align 8 dereferenceable(176) %11, ptr noundef %12), !dbg !210203
store ptr getelementptr inbounds ({ [24 x ptr] }, ptr @_ZTV10AP_GPS_MAV, i32 0, inrange i32 0, i32 2), ptr %9, align 8, !dbg !210203
%14 = getelementptr inbounds %class.AP_GPS_MAV, ptr %9, i32 0, i32 3, !dbg !210204
%15 = call fastcc noundef ptr @_ZN16JitterCorrectionC2Ett(ptr noundef nonnull align 8 dereferenceable(28) %14, i16 noundef zeroext 2000, i16 noundef zeroext 100), !dbg !210205
ret ptr %9, !dbg !210203
}
you could see that the call of two case is same,so I can not distinguish the two case by modifying above code
Any other way to distinguish them?
The class definition is like this
The definition of class in IR is:
In the generated class inheritance graph, the edges related to this class are:
The analysis result of SVF is that the parent class of AP_GPS_Backend is JitterCorrection, but in fact JitterCorrection is a member of AP_GPS_Backend. The version of SVF that I using is svf-3.0,and I use the command "wpa -sfrander -dump-cha "to generate the CHA.