SVF-tools / SVF

Static Value-Flow Analysis Framework for Source Code
http://svf-tools.github.io/SVF/
Other
1.43k stars 435 forks source link

When analyzing the class inheritance relationship, treat the members of the class as the parent class #1539

Open zz-fz-john opened 2 months ago

zz-fz-john commented 2 months ago

The class definition is like this

class AP_GPS_Backend
{
public:
 ...
protected:
    AP_HAL::UARTDriver *port;           ///< UART we are attached to
    AP_GPS &gps;                        ///< access to frontend (for parameters)
    AP_GPS::GPS_State &state;           ///< public state for this instance

    uint64_t _last_pps_time_us;
    JitterCorrection jitter_correction;
    uint32_t _last_itow_ms;
...

The definition of class in IR is:

%class.AP_GPS_Backend = type <{ ptr, ptr, ptr, ptr, i64, %class.JitterCorrection, i32, [4 x i8], i64, i32, i32, i32, i32, i16, [6 x i8] }>

In the generated class inheritance graph, the edges related to this class are:

    Node0x5596521120f0 [shape=record,shape=box,label="{AP_GPS_Backend}"];
    Node0x5596521120f0 -> Node0x55965218ba90[style=solid];
    Node0x55965218ba90 [shape=record,shape=box,label="{JitterCorrection}"];

The analysis result of SVF is that the parent class of AP_GPS_Backend is JitterCorrection, but in fact JitterCorrection is a member of AP_GPS_Backend. The version of SVF that I using is svf-3.0,and I use the command "wpa -sfrander -dump-cha "to generate the CHA.

yuleisui commented 2 months ago

Would you be able to dig a bit and locate where the bug is?

zz-fz-john commented 2 months ago

Would you be able to dig a bit and locate where the bug is?

I think it’s because in the process of analyzing class members, all class members that are structures are regarded as their parent classes.

For example the above code:

%class.AP_GPS_Backend = type <{ ptr, ptr, ptr, ptr, i64, %class.JitterCorrection, i32, [4 x i8], i64, i32, i32, i32, i32, i16, [6 x i8] }>

In fact, JitterCorrection is a member variable in AP_GPS_Backend, but svf mistakenly recognized it as the parent class of AP_GPS_Backend.

I read some code, and I think maybe it's because of the readInheritanceMetadataFromModule(const Module &M) function in CHGBuilder.cpp, which iterates through all the operands and treats the operands as his parent class. here is the code:

void CHGBuilder::readInheritanceMetadataFromModule(const Module &M)
{
    for (Module::const_named_metadata_iterator mdit = M.named_metadata_begin(),
            mdeit = M.named_metadata_end(); mdit != mdeit; ++mdit)
    {
        const NamedMDNode *md = &*mdit;
        string mdname = md->getName().str();
        if (mdname.compare(0, 15, "__cxx_bases_of_") != 0)
            continue;
        string className = mdname.substr(15);
        for (NamedMDNode::const_op_iterator opit = md->op_begin(),
                opeit = md->op_end(); opit != opeit; ++opit)
        {
            const MDNode *N = *opit;
            const MDString* mdstr = SVFUtil::cast<MDString>(N->getOperand(0).get());
            string baseName = mdstr->getString().str();
            chg->addEdge(className, baseName, CHEdge::INHERITANCE);
        }
    }
}

But I have a little doubt: I queried the entire IR and did not find "__cxx_basesof".

But I have one question: I queried the entire IR and did not find "__cxx_basesof". Is the "__cxx_basesof" prefix generated by dynamic analysis during processing? Or does this prefix always exist in the IR file? If it is the former, then I think there is a problem with the way readInheritanceMetadataFromModule is processed. The processing logic of this function should be in the second for loop. When it encounters the first parameter that is not a structure, it will stop connecting the class inheritance graph. If so The latter, then I can't determine the problem.

yuleisui commented 2 months ago

CHG should not rely on metadata. We will need to identify the pattern from llvm bc alone.

zz-fz-john commented 2 months ago

I think I found the problem, the function "buildCHG()" will call the function "void CHGBuilder::buildCHGNodes(const Function F)", and then the function "void CHGBuilder::buildCHGNodes(const Function F)" will call " connectInheritEdgeViaCall" the function "connectInheritEdgeViaCall" 's code is here:

void CHGBuilder::connectInheritEdgeViaCall(const Function* caller, const CallBase* cs)
{
    if (getCallee(cs) == nullptr)
        return;

    const Function* callee = getCallee(cs);

    struct DemangledName dname = demangle(caller->getName().str());
    if ((isConstructor(caller) && isConstructor(callee)) || (isDestructor(caller) && isDestructor(callee)))
    {
        if (cs->arg_size() < 1 || (cs->arg_size() < 2 && cs->paramHasAttr(0, llvm::Attribute::StructRet)))
            return;
        const Value* csThisPtr = cppUtil::getVCallThisPtr(cs);
        //const Argument* consThisPtr = getConstructorThisPtr(caller);
        //bool samePtr = isSameThisPtrInConstructor(consThisPtr, csThisPtr);
        bool samePtrTrue = true;
        if (csThisPtr != nullptr && samePtrTrue)
        {
            struct DemangledName basename = demangle(callee->getName().str());
            if (!LLVMUtil::isCallSite(csThisPtr)  &&
                    basename.className.size() > 0)
            {
                chg->addEdge(dname.className, basename.className, CHEdge::INHERITANCE);
            }
        }
    }
}

This function will find the constructor or destructor of a class. If the constructor (destructor) of the class calls the constructor (destructor) of another class, the class will be treated as a subclass of the other class.

zz-fz-john commented 2 months ago

I think I found the problem, the function "buildCHG()" will call the function "void CHGBuilder::buildCHGNodes(const Function F)", and then the function "void CHGBuilder::buildCHGNodes(const Function F)" will call " connectInheritEdgeViaCall" the function "connectInheritEdgeViaCall" 's code is here:

void CHGBuilder::connectInheritEdgeViaCall(const Function* caller, const CallBase* cs)
{
    if (getCallee(cs) == nullptr)
        return;

    const Function* callee = getCallee(cs);

    struct DemangledName dname = demangle(caller->getName().str());
    if ((isConstructor(caller) && isConstructor(callee)) || (isDestructor(caller) && isDestructor(callee)))
    {
        if (cs->arg_size() < 1 || (cs->arg_size() < 2 && cs->paramHasAttr(0, llvm::Attribute::StructRet)))
            return;
        const Value* csThisPtr = cppUtil::getVCallThisPtr(cs);
        //const Argument* consThisPtr = getConstructorThisPtr(caller);
        //bool samePtr = isSameThisPtrInConstructor(consThisPtr, csThisPtr);
        bool samePtrTrue = true;
        if (csThisPtr != nullptr && samePtrTrue)
        {
            struct DemangledName basename = demangle(callee->getName().str());
            if (!LLVMUtil::isCallSite(csThisPtr)  &&
                    basename.className.size() > 0)
            {
                chg->addEdge(dname.className, basename.className, CHEdge::INHERITANCE);
            }
        }
    }
}

This function will find the constructor or destructor of a class. If the constructor (destructor) of the class calls the constructor (destructor) of another class, the class will be treated as a subclass of the other class.

In fact, the situation where the constructor of one class calls the constructor of another class may be caused by two reasons: 1. The class is a subclass of another class. 2.An instance of another class is a member variable of that class.

yuleisui commented 2 months ago

I think I found the problem, the function "buildCHG()" will call the function "void CHGBuilder::buildCHGNodes(const Function F)", and then the function "void CHGBuilder::buildCHGNodes(const Function F)" will call " connectInheritEdgeViaCall" the function "connectInheritEdgeViaCall" 's code is here:

void CHGBuilder::connectInheritEdgeViaCall(const Function* caller, const CallBase* cs)
{
    if (getCallee(cs) == nullptr)
        return;

    const Function* callee = getCallee(cs);

    struct DemangledName dname = demangle(caller->getName().str());
    if ((isConstructor(caller) && isConstructor(callee)) || (isDestructor(caller) && isDestructor(callee)))
    {
        if (cs->arg_size() < 1 || (cs->arg_size() < 2 && cs->paramHasAttr(0, llvm::Attribute::StructRet)))
            return;
        const Value* csThisPtr = cppUtil::getVCallThisPtr(cs);
        //const Argument* consThisPtr = getConstructorThisPtr(caller);
        //bool samePtr = isSameThisPtrInConstructor(consThisPtr, csThisPtr);
        bool samePtrTrue = true;
        if (csThisPtr != nullptr && samePtrTrue)
        {
            struct DemangledName basename = demangle(callee->getName().str());
            if (!LLVMUtil::isCallSite(csThisPtr)  &&
                    basename.className.size() > 0)
            {
                chg->addEdge(dname.className, basename.className, CHEdge::INHERITANCE);
            }
        }
    }
}

This function will find the constructor or destructor of a class. If the constructor (destructor) of the class calls the constructor (destructor) of another class, the class will be treated as a subclass of the other class.

In fact, the situation where the constructor of one class calls the constructor of another class may be caused by two reasons: 1. The class is a subclass of another class. 2.An instance of another class is a member variable of that class.

That is a good finding. Could we distinguish these two cases by changing the above code to fix it?

zz-fz-john commented 2 months ago

Unfortunately, in the above code, it should be difficult to distinguish between the two cases. there is a case to show that the two case is same in IR.

class AP_GPS_MAV : public AP_GPS_Backend {
public:

    using AP_GPS_Backend::AP_GPS_Backend;

    bool read() override;

    static bool _detect(struct MAV_detect_state &state, uint8_t data);

    void handle_msg(const mavlink_message_t &msg) override;

    const char *name() const override { return "MAV"; }

private:
    bool _new_data;
    uint32_t first_week;
    JitterCorrection jitter{2000};
};
#endif

the correspond IR is:

; Function Attrs: noinline nounwind optnone
define internal fastcc noundef ptr @_ZN10AP_GPS_MAVCI214AP_GPS_BackendER6AP_GPSRNS0_9GPS_StateEPN6AP_HAL10UARTDriverE(ptr noundef nonnull returned align 8 dereferenceable(128) %0, ptr noundef nonnull align 8 dereferenceable(928) %1, ptr noundef nonnull align 8 dereferenceable(176) %2, ptr noundef %3) unnamed_addr #5 comdat align 2 !dbg !210192 {
  %5 = alloca ptr, align 4
  %6 = alloca ptr, align 4
  %7 = alloca ptr, align 4
  %8 = alloca ptr, align 4
  store ptr %0, ptr %5, align 4
  call void @llvm.dbg.declare(metadata ptr %5, metadata !210197, metadata !DIExpression()), !dbg !210199
  store ptr %1, ptr %6, align 4
  call void @llvm.dbg.declare(metadata ptr %6, metadata !210200, metadata !DIExpression()), !dbg !210199
  store ptr %2, ptr %7, align 4
  call void @llvm.dbg.declare(metadata ptr %7, metadata !210201, metadata !DIExpression()), !dbg !210199
  store ptr %3, ptr %8, align 4
  call void @llvm.dbg.declare(metadata ptr %8, metadata !210202, metadata !DIExpression()), !dbg !210199
  %9 = load ptr, ptr %5, align 4
  %10 = load ptr, ptr %6, align 4, !dbg !210203
  %11 = load ptr, ptr %7, align 4, !dbg !210203
  %12 = load ptr, ptr %8, align 4, !dbg !210203
  %13 = call fastcc noundef ptr @_ZN14AP_GPS_BackendC2ER6AP_GPSRNS0_9GPS_StateEPN6AP_HAL10UARTDriverE(ptr noundef nonnull align 8 dereferenceable(90) %9, ptr noundef nonnull align 8 dereferenceable(928) %10, ptr noundef nonnull align 8 dereferenceable(176) %11, ptr noundef %12), !dbg !210203
  store ptr getelementptr inbounds ({ [24 x ptr] }, ptr @_ZTV10AP_GPS_MAV, i32 0, inrange i32 0, i32 2), ptr %9, align 8, !dbg !210203
  %14 = getelementptr inbounds %class.AP_GPS_MAV, ptr %9, i32 0, i32 3, !dbg !210204
  %15 = call fastcc noundef ptr @_ZN16JitterCorrectionC2Ett(ptr noundef nonnull align 8 dereferenceable(28) %14, i16 noundef zeroext 2000, i16 noundef zeroext 100), !dbg !210205
  ret ptr %9, !dbg !210203
}

you could see that the call of two case is same,so I can not distinguish the two case by modifying above code

yuleisui commented 2 months ago

Any other way to distinguish them?