SVF-tools / SVF

Static Value-Flow Analysis Framework for Source Code
http://svf-tools.github.io/SVF/
Other
1.43k stars 435 forks source link

False Negative in Anderson Points-to Analysis #1265

Open HiragiChi opened 11 months ago

HiragiChi commented 11 months ago

Hi, Thanks for providing such a useful tool. I tried to use wpa -ander to detect all indirect function calls via points-to analysis, and the target program is PHP (https://github.com/php/php-src). However, when I query the point-to result of an indirect call (ext/hash/hash.c:385) ops->hash_init(context, args); And try to get the potential callees of it, but SVF fails to give me any of the actual callees in dynamic analysis (I ran a lot of test cases and got the real callees). False negative rates is 100%

here are some instructions and code I am using wpa -ander -print-pts php.bc

I am using 3 methods to query about the callees of the caller, I add these codes in Andersen.cpp: dumpTopLevelPtsTo() function.

1) traversing the call maps

`PTACallGraph* callgraph = this->getPTACallGraph();
    CallEdgeMap icalls=callgraph->getIndCallMap();

    for (CallEdgeMap::iterator it = icalls.begin(), eit = icalls.end(); it!=eit; ++it)
    {
        const CallICFGNode* callInst = it->first;
        outs() << "\nCallInst " << callInst->getId() << " ";
        callInst->dump();
        FunctionSet fset = it->second;
        outs() << "\nCallee: ";
        for (FunctionSet::iterator cit = fset.begin(), ecit = fset.end(); cit!=ecit; ++cit)
        {
            const SVFFunction* callee = *cit;
            if(callee->getName()!="__VERIFIER_error"){
                outs() << callee->getName() << " ";
            }
        }
        outs() << "\n";
    }` 

2)Directly using points-to result of the load instruction of the original code ops->hash_init(context, args); is translated to a set of instructions containing 63 = load void (i8*, %struct._zend_array*)*, void (i8*, %struct._zend_array*)** %62, align 8, !dbg !42221, !tbaa !42222 { "ln": 385, "cl": 7, "fl": "ext/hash/hash.c" }. I am using the point-to result of this instruction to find the potential callees.

3) Using the function getIndirectCallsites()

`  
    CallSiteToFunPtrMap callsites=getIndirectCallsites();
    for(CallSiteToFunPtrMap::const_iterator iter = callsites.begin(), eiter = callsites.end(); iter!=eiter; ++iter){
        // outs() << "iter1\n";
        const SVF::CallICFGNode *callInst=iter->first;
        if(this->hasIndCSCallees(callInst)){
            // outs() << "iter2\n";
            outs() << "\nCallInst " << callInst->getId() << " ";
            callInst->dump();
            FunctionSet fset=this->getIndCSCallees(callInst);
            outs() << "\nCallee: ";
            for(FunctionSet::iterator cit = fset.begin(), ecit = fset.end(); cit!=ecit; ++cit){
                // outs() << "iter3\n";

                const SVFFunction *callee=*cit;
                if(callee->getName()!="__VERIFIER_error"){

                    outs() << callee->getName() << " ";

                }
            }
            outs() << "\n";
        }

    }`

All of the querying methods return similar results, without the real callees. Here are the results of these querying method, and the actual callee I found using dynamic analysis:

SVF Anderson Analysis accel_is_readable accel_is_file accel_file_exists zif_accel_chdir phar_is_link phar_is_file phar_lstat phar_is_dir phar_is_executable phar_filetype phar_filectime phar_filemtime dom_document_preserve_whitespace_write dom_node_owner_document_read dom_document_preserve_whitespace_read dom_entity_system_id_read dom_document_standalone_write dom_document_standalone_read dom_document_encoding_read dom_element_id_write spl_fixedarray_it_get_current_key dom_document_strict_error_checking_write phar_fopen phar_is_writable dom_parent_node_last_element_child_read dom_parent_node_first_element_child_read dom_element_class_name_read spl_fixedarray_object_count_elements dom_node_text_content_write date_object_get_debug_info_timezone dom_node_node_name_read dom_document_format_output_write phar_is_readable dom_node_node_value_write dom_documenttype_system_id_read dom_document_substitue_entities_read dom_node_previous_sibling_read zif_pass dom_document_recover_write dom_attr_specified_read date_interval_compare_objects dom_document_recover_read dom_node_namespace_uri_read dom_document_validate_on_parse_write dom_entity_notation_name_read dom_document_format_output_read dom_node_child_nodes_read dom_document_document_element_read dom_node_node_value_read dom_node_first_child_read dom_document_validate_on_parse_read dom_document_encoding_write date_object_get_properties_for_timezone date_object_compare_timezone dom_document_version_read dom_document_implementation_read spl_filesystem_tree_it_current_key date_object_get_properties_for dom_node_previous_element_sibling_read dom_document_version_write implement_date_interface_handler phar_file_exists dom_document_substitue_entities_write phar_file_get_contents dom_document_document_uri_write dom_node_node_type_read dom_documenttype_public_id_read dom_document_strict_error_checking_read dom_nodelist_length_read phar_stat spl_filesystem_dir_it_current_key spl_object_storage_unset_dimension date_object_compare_date dom_entity_public_id_read dom_node_parent_element_read sxe_get_debug_info phar_opendir dom_node_text_content_read dom_attr_schema_type_info_read phar_filegroup dom_node_parent_node_read dom_node_attributes_read dom_document_document_uri_read dom_notation_system_id_read dom_node_prefix_write spl_dllist_object_count_elements date_period_it_current_key dom_characterdata_data_write dom_node_local_name_read dom_node_base_uri_read dom_namednodemap_length_read dom_characterdata_data_read dom_node_is_connected_read dom_characterdata_length_read dom_node_next_element_sibling_read spl_dllist_it_get_current_key dom_attr_value_write dom_attr_owner_element_read dom_element_tag_name_read dom_parent_node_child_element_count dom_element_class_name_write spl_fixedarray_object_get_properties_for dom_document_doctype_read dom_element_id_read dom_get_debug_info sxe_objects_compare dom_element_schema_type_info_read dom_documenttype_notations_read spl_heap_it_get_current_key dom_text_whole_text_read dom_documenttype_name_read phar_readfile dom_processinginstruction_data_write sxe_count_elements spl_array_unset_dimension phar_fileinode dom_documenttype_entities_read dom_documenttype_internal_subset_read dom_notation_public_id_read dom_document_resolve_externals_write dom_entity_encoding_read dom_node_next_sibling_read dom_entity_version_read phar_fileowner dom_processinginstruction_target_read dom_attr_value_read spl_array_get_properties_for dom_xpath_document_read dom_node_prefix_read dom_xpath_register_node_ns_read dom_xpath_register_node_ns_write phar_fileperms sxe_dimension_delete php_sxe_iterator_current_key dom_processinginstruction_data_read spl_array_object_count_elements dom_attr_name_read spl_array_compare_objects dom_document_resolve_externals_read dom_entity_actual_encoding_read dom_node_last_child_read spl_fixedarray_object_unset_dimension spl_array_it_get_current_key spl_heap_object_count_elements spl_object_storage_compare_objects dom_document_config_read phar_filesize phar_fileatime

Real Callee PHP_FNV1_32_INIT PHP_FNV164Init PHP_SHA1Init PHP_SHA256Init,etc.

Could you tell me why there are false negatives in the pointer analysis to indirect calls? Thanks in advance

yuleisui commented 11 months ago

Thanks for reporting this. False negatives or unsoundness for C++ programs could be incomplete modelling of C++ libraries. This is really an area, SVF could be improved.

HiragiChi commented 11 months ago

Hi, Thanks for the timely reply! I have some follow up questions.

By "modeling of C++ libraries", do you mean modeling the built-in library like stdlib or the user-defined library (like some .a files built in the program)?  

And is there any way to improve the modeling, especially for indirect call analysis? Currently, I can think of manually adding some modeling for the related functions. Is there better way to do this?

Any suggestions and answers are much appreciated! Thanks in advance

yuleisui commented 11 months ago

You may wish to take a look at this file and how it is being used in SVF: https://github.com/SVF-tools/SVF/blob/master/svf-llvm/lib/extapi.c

HiragiChi commented 11 months ago

Thanks! I will take a look