NSSL-SJTU / SaTC

A prototype of Shared-keywords aware Taint Checking, a novel static analysis approach that tracks the data flow of the user input between front-end and back-end to precisely detect security vulnerabilities.
274 stars 48 forks source link

totolink固件分析报错 #9

Open huahai111 opened 2 years ago

huahai111 commented 2 years ago

作者您好,我用您提供的satc的docker容器和数据集,去分析论文中提到的totolink a950和t10这两个固件,用-b指定了要分析有漏洞的cgi文件,但是无法输出命令执行和缓冲区溢出的两个包含函数路径调用的txt文件,最后导致程序在后续的分析中找不到这两个txt然后终止了,您有空解答一下么

huahai111 commented 2 years ago

┌──(kali㉿kali)-[~/桌面] └─$ sudo docker run -it -v /home/kali/桌面/SaTC_data/extracted_res/path:/home/satc/SaTC/SaTC_data smile0304/satc /bin/bash [sudo] kali 的密码: nohup: appending output to 'nohup.out' (SaTC) satc@33558df7dc43:~$ cd SaTC/ (SaTC) satc@33558df7dc43:~/SaTC$ ls SaTC_data config.py front_analysise ghidra headless init.sh jsparse requirements.txt satc.py taint_check (SaTC) satc@33558df7dc43:~/SaTC$ cd SaTC_data/ (SaTC) satc@33558df7dc43:~/SaTC/SaTC_data$ ls satc111.py squashfs-root_t10 (SaTC) satc@33558df7dc43:~/SaTC/SaTC_data$ cd .. (SaTC) satc@33558df7dc43:~/SaTC$ python satc.py -d /home/satc/SaTC/SaTC_data/squashfs-root_t10/ -o /home/satc/SaTC/SaTC_data/res --ghidra_script=ref2sink_cmdi --ghidra_script=ref2sink_bof -b downloadFlile.cgi --taint_check

~/SaTC/SaTC_data对应的目录如下: 图片

报错信息: (AutoAnalysisManager)
INFO REPORT: Analysis succeeded for file: /home/satc/SaTC/SaTC_data/squashfs-root_t10/web_cste/cgi-bin/downloadFlile.cgi (HeadlessAnalyzer)
INFO SCRIPT: /home/satc/SaTC/headless/ref2sink_bof.py (HeadlessAnalyzer)
Traceback (most recent call last): File "/home/satc/SaTC/headless/ref2sink_bof.py", line 332, in paramTargets = set(open(args[0]).read().strip().split()) IOError: [Errno 2] No such file or directory: u'/home/satc/SaTC/SaTC_data/res/keyword_extract_result/simple/.data/downloadFlile.cgi.result' INFO ANALYZING changes made by post scripts: /home/satc/SaTC/SaTC_data/squashfs-root_t10/web_cste/cgi-bin/downloadFlile.cgi (HeadlessAnalyzer)
INFO REPORT: Post-analysis succeeded for file: /home/satc/SaTC/SaTC_data/squashfs-root_t10/web_cste/cgi-bin/downloadFlile.cgi (HeadlessAnalyzer)
INFO REPORT: Save succeeded for file: /downloadFlile.cgi (HeadlessAnalyzer)
No handlers could be found for logger "root" /home/satc/.virtualenvs/SaTC/local/lib/python2.7/site-packages/cffi/cparser.py:165: UserWarning: Global variable 'r' in cdef(): for consistency with C it should have a storage class specifier (usually 'extern') "(usually 'extern')" % (decl.name,)) Traceback (most recent call last): File "satc.py", line 310, in main() File "satc.py", line 301, in main taint_stain_analysis(bin_path, ghidra_result, args.output) File "/home/satc/SaTC/taint_check/main.py", line 129, in taint_stain_analysis conv_Ghidra_output.main(ghidra_analysis_result) File "/home/satc/SaTC/taint_check/conv_Ghidra_output.py", line 11, in main with open(filename,'r') as f: IOError: [Errno 2] No such file or directory: '/home/satc/SaTC/SaTC_data/res/ghidra_extract_result/downloadFlile.cgi/downloadFlile.cgi_ref2sink_cmdi.result' (SaTC) satc@33558df7dc43:~/SaTC$ 图片 打开res中的结果: 图片 的确没有生成txt文件。

中间有一处: 图片 INFO REPORT: Analysis succeeded for file: /home/satc/SaTC/SaTC_data/squashfs-root_t10/web_cste/cgi-bin/downloadFlile.cgi (HeadlessAnalyzer)
INFO SCRIPT: /home/satc/SaTC/headless/ref2sink_cmdi.py (HeadlessAnalyzer)
Traceback (most recent call last): File "/home/satc/SaTC/headless/ref2sink_cmdi.py", line 345, in paramTargets = set(open(args[0]).read().strip().split()) IOError: [Errno 2] No such file or directory: u'/home/satc/SaTC/SaTC_data/res/keyword_extract_result/simple/.data/downloadFlile.cgi.result'

这个结果也没有生成。应该是这个txt影响了后边txt的生成,但不太明白这里是因为什么原因导致的. 烦请您指点,万分感谢.

smile0304 commented 2 years ago

TOTOLINK 属于论文中提到的多个bin共享关键字,调用顺序应该为 :前段关键字 -> lighttpd -> xxx.cgi

也就是说,数据是要先经过lighttpd的,你可以使用share2sink从lighttpd中提出使用nvram或env设置的关键字,然后再追踪流向cgi的关键字。

至于为什么你的结果中没有生成u'/home/satc/SaTC/SaTC_data/res/keyword_extract_result/simple/.data/downloadFlile.cgi.result'文件的原因是由于satc没有找到直接从前段流向cgi的关键,所以此文件不会生成。

info.txt记录了经过每一步过滤剩余的关键字数量, Clustering_result.txt记录的是从bin中提取到的关键字

image
huahai111 commented 2 years ago

十分感谢您的回复!

huahai111 commented 2 years ago

作者您好,我尝试使用以下命令: 进入docker: sudo docker run -it -v /home/kali/桌面/SaTC_data/extracted_res/path:/home/satc/SaTC/SaTC_data smile0304/satc /bin/bash 执行stac: (SaTC) satc@b38a9b9d2f0c:~/SaTC$ python satc.py -d /home/satc/SaTC/SaTC_data/squashfs-root_t10/ -o /home/satc/SaTC/SaTC_data/res-t10 --ghidra_script=ref2share -b lighttpd 图片 lighttpd_ref2share.result中“Shares”为空 图片 info.txt情况确实有关键字过滤后被保留下来。 (SaTC) satc@b38a9b9d2f0c:~/SaTC$ python satc.py -d /home/satc/SaTC/SaTC_data/squashfs-root_t10/ -o /home/satc/SaTC/SaTC_data/res-t10-share --ghidra_script=share2sink --ref2share_result=/home/satc/SaTC/SaTC_data/res-t10/ghidra_extract_result/lighttpd/lighttpd_ref2share.result -b downloadFlile.cgi --taint_check 而后执行上边的命令,貌似因为上一步结果没有“Shares”所以这一步也没有分析出: 图片

再次向您求助

huahai111 commented 2 years ago

图片 尝试手动再固件内查找的结果

smile0304 commented 2 years ago

请人工确认一下downloadFlile.cgi的数据流向,是谁调用了这个cgi,关键字如何传入。或者使用grep在固件中搜索downloadFlile.cgi产生漏洞的参数,查看该关键字是否在asp文件中被使用。

如果关键字在asp文件中出现,那就是SaTC的问题了,SaTC目前没有可以将asp语言解析成语法树的规则。目前处理asp文件,只能提取出asp中的html代码,然后使用在HTML页面提取关键字的方法进行提取。

我们之前处理的漏洞是从lighttpd -> system.so

huahai111 commented 2 years ago

好的,感谢您的回复。我这里还有一个疑问。lighttpd -> system.so这个调用,是否也是需要先用ref2share去搜索lighttpd其中的nvram,set传递的关键字呢?如果是这样,ref2share对lighttpd执行获得的“Shares”为空,后续对system.so的分析也就无法继续了~

想咨询一下,您当时是怎么处理这种情况的

smile0304 commented 2 years ago

你可以用我们提供的固件集中的totolink固件试一下,我们当时测试的时候是不存找不到nvram_set等函数调用的。

这是我们当时搜索命令注入点的结果

image
huahai111 commented 2 years ago

感谢您回复。测试的固件是从您提供的测试集中获取解包出来的。 您图片里是cmdi的结果,论文中提到的漏洞,您当时是直接通过ref2cmdi发现的么,ref2share的执行结果中“Shares”也是空么? 如果和您当初的结果不一致,或许我应该重新拉一下docker镜像

huahai111 commented 2 years ago

这是您数据集中的固件解包的情况 图片 图片 尝试手动grep匹配您给的截图中的参数“hostTime”: 图片 然后satc用ref2sink_cmdi执行: 图片 结果还是出现了图中的情况

huahai111 commented 2 years ago

图片 您论文中提到的totolink_t10的漏洞是上图提到的“QUERY_STRING ”参数引起的么 图片 这个参数只在后端bin文件中匹配到了

huahai111 commented 2 years ago

作者您好,我发现了一些问题,似乎不是您代码脚本的原因,而是ghidra的api导致的,但我不知道怎么解决: 通过以下命令分析totolink_t10: python satc.py -d /home/satc/SaTC/SaTC_data/squashfs-root_t10/ -o /home/satc/SaTC/SaTC_data/res --ghidra_script=ref2sink_cmdi -b system.so 我尝试输出一些调试信息,对ref2sink_cmdi.py做了一些不影响功能的修改: 图片 getReferencesTo这个函数似乎没有对system.so发挥作用输出交叉引用的地址。 手动通过ida查看: 图片 图片 确实hostTime参数被引用了。 但在对lighthttp的分析中,getReferencesTo好像又有了效果 图片 因为对ghidra的api不是很了解,能麻烦您抽空看一下这个么

huahai111 commented 2 years ago

idautisl.CodeRefsTO(ea,flow) idautils.DataRefsTo(e) ida python是用两个函数分开对数据和代码做处理的,不太清楚ghidra是否有这样的区分

smile0304 commented 2 years ago

尝试用以下两个脚本分析跑一下so文件,用于修复您提到的问题 fixso.py

from ghidra.program.model.mem import MemoryAccessException
import ghidra.program.model.symbol.RefType.DATA
runScript('CodatifyFixupData.py')

oldBase = currentProgram.imageBase
currentProgram.setImageBase(toAddr(0), False)

def getStr(addr):
    ad = addr
    ret = ''
    try:
        while not ret.endswith('\0'):
            ret += chr(getByte(ad) % 256)
            ad = ad.add(1)
    except MemoryAccessException:
        return
    return ret[:-1]

regs = [currentProgram.getRegister(regname) for regname in 'a0 a1 a2 a3'.split()]
inst = getFirstInstruction()
while inst is not None:
    if inst.mnemonicString.endswith('addiu'):
        if inst.getOpObjects(0)[0] == inst.getOpObjects(1)[0] and inst.getOpObjects(0)[0] in regs:
            # print(inst)
            addr = toAddr(inst.getOpObjects(2)[0].value)
            data = getStr(addr)
            if data is not None:
                print '%s  %s reference to %s' % (inst.address, inst, data)
                createMemoryReference(inst, 2, addr, ghidra.program.model.symbol.RefType.DATA)
            else:
                print '%s %s ref data is None' % (inst.address, inst)
    inst = inst.next

currentProgram.setImageBase(oldBase, False)

CodatifyFixupData.py

# Fixup .data and .rodata sections by defining strings and forcing remaining undefined data to be a DWORD.
#@author fuzzywalls
#@category TNS
#@menupath TNS.Codatify.Fixup Data

from utils import functiontable

from ghidra.program.model.data import PointerDataType

def find_data_sections():
    """
    Search for non-executable sections in the memory map.
    """
    data_sections = []

    # Find all memory sections and remove the executable sections.
    addr_factory = currentProgram.getAddressFactory()
    memory_manager = currentProgram.getMemory()
    address_ranges = memory_manager.getLoadedAndInitializedAddressSet()
    executable_set = memory_manager.getExecuteSet()

    addr_view = address_ranges.xor(executable_set)

    for section in addr_view:
        new_view = addr_factory.getAddressSet(section.getMinAddress(),
                                              section.getMaxAddress())
        data_sections.append(new_view)

    return data_sections

def define_strings(section):
    """
    Convert undefined strings in the section provided to ascii.

    :param section: Section to search for undefined strings.
    :type section: ghidra.program.model.listing.ProgramFragment
    """
    if section is None:
        return

    strings = findStrings(section, 1, 1, True, True)

    string_count = 0
    for string in strings:
        if getUndefinedDataAt(string.getAddress()):
            try:
                createAsciiString(string.getAddress())
                string_count += 1
            except:
                continue

    print 'Strings - {}'.format(string_count)

def get_pointer_type():
    """
    Get the correct pointer size for the current architecture.
    """
    return PointerDataType(None, currentProgram.getDefaultPointerSize())

def define_pointers(section):
    """
    Convert undefined data to valid pointers. 

    :param section: The section to convert pointers in.
    :type section: ghidra.program.model.listing.ProgramFragment
    """
    if section is None:
        return

    start_addr = section.getMinAddress()
    end_addr = section.getMaxAddress()

    undefined_data = getUndefinedDataAt(start_addr)
    if undefined_data is None:
        undefined_data = getUndefinedDataAfter(start_addr)

    pointer_count = 0
    pointer_type = get_pointer_type()
    memory_manager = currentProgram.getMemory()

    while undefined_data is not None and undefined_data.getAddress() < end_addr:
        undefined_addr = undefined_data.getAddress()
        try:
            # At each undefined byte, convert it to a pointer and see if it
            # has any valid references. If it does validate the reference goes
            # to a valid memory address using the memory manager.
            createData(undefined_addr, pointer_type)
            references = getReferencesFrom(undefined_addr)
            if len(references):
                if memory_manager.contains(references[0].getToAddress()):
                    pointer_count += 1
                else:
                    removeDataAt(undefined_addr)
            else:
                removeDataAt(undefined_addr)
        except:
            pass
        finally:
            undefined_data = getUndefinedDataAfter(undefined_addr)

    print 'Pointers - {}'.format(pointer_count)

def define_data(section):
    """
    Convert undefined data to a DWORD.

    :param section: Section to search for undefined data in.
    :type section: hidra.program.model.listing.ProgramFragment
    """
    if section is None:
        return

    start_addr = section.getMinAddress()
    end_addr = section.getMaxAddress()

    undefined_data = getUndefinedDataAt(start_addr)
    if undefined_data is None:
        undefined_data = getUndefinedDataAfter(start_addr)

    data_count = 0
    while undefined_data is not None and undefined_data.getAddress() < end_addr:
        undefined_addr = undefined_data.getAddress()
        undefined_data = getUndefinedDataAfter(undefined_addr)
        try:
            createDWord(undefined_addr)
            data_count += 1
        except:
            continue

    print 'DWORDS - {}'.format(data_count)

def fixup_section(section):
    """
    Fixup the section by defining strings and converting undefined data to 
    DWORDs.

    :param section: Section to fixup.
    :type section: str
    """
    print 'Section {} - {}'.format(section.getMinAddress(),
                                   section.getMaxAddress())
    print '-' * 30

    define_pointers(section)
    define_strings(section)
    define_data(section)

    ft_finder = functiontable.Finder(currentProgram, section)
    ft_finder.find_function_table()
    ft_finder.rename_functions()

    print '\n'

# Base address of 0 can really mess things up when fixing up pointers.
# Make sure the user really wants this.
base_addr = currentProgram.getMinAddress()
if base_addr.toString() == u'00000000' and \
    not askYesNo('Base Address Zero', 'The base address is set to 0 which can '
                 'introduce a large amount of false positives when fixing up '
                 'the data section. \nDo you want to continue?'):
    exit(0)

data_sections = find_data_sections()

print 'Fixing up data...\n'
for section in data_sections:
    fixup_section(section)

我们并没有将这个两个脚本集成到这个工程中

huahai111 commented 2 years ago

您好,我尝试了修复: 注释掉runScript('CodatifyFixupData.py'),fixso.py单独运行的确获取到了信息: 图片 但CodatifyFixupData.py的脚本始终提示缺少包: 图片 报错提示: CodatifyFixupData.py> Running... CodatifyFixupData.py> Running... Traceback (most recent call last): File "/home/kali/ghidra_scripts/CodatifyFixupData.py", line 9, in from utils import functiontable ImportError: No module named utils 打印出了python解释器路径,但不知道如何安装utils: /usr/share/ghidra/Ghidra/Features/Python/data/jython-2.7.2/bin/jython 这是kali中安装的ghidra版本: 图片 是否与版本相关呢?

另外我尝试在您给的docker中执行脚本: 图片 脚本拷贝到了相应的目录中: 图片 但提示找不到脚本: 图片 再次麻烦您,请求指点

smile0304 commented 2 years ago

导致这个问题的原因是ida为system.so设置的加载基址是0,但是ghidra设置的加载基址为0x10000。需要将ghidra的加载基址改为0。

image

但是后面有一部分没有全部修改成功,交叉引用还是没有识别出来,不过您可以试一下。 image

应该是这个问题导致的。

huahai111 commented 2 years ago

作者您好,在ghidra中执行ref2share_gui.py 图片 可以交叉引用找到了 但在fixso.py中注释掉了CodatifyFixupData.py,相当于只改动了基址吧 CodatifyFixupData.py: from utils import functiontable ImportError: No module named utils 这块儿还不知道怎么解决安装utils 图片

RobinWang825 commented 2 years ago

huahai您好,我在复现dlink_878的时候遇到了您这个类似的错误,错误提示如下 No handlers could be found for logger "root" /home/satc/.virtualenvs/SaTC/local/lib/python2.7/site-packages/cffi/cparser.py:165: UserWarning: Global variable 'r' in cdef(): for consistency with C it should have a storage class specifier (usually 'extern') "(usually 'extern')" % (decl.name,)) 不知道这个如何解决呢?期待您的回复

huahai111 commented 2 years ago

似乎是在这条issue https://github.com/NSSL-SJTU/SaTC/issues/2 里有解答。

smile0304 commented 2 years ago

CodatifyFixupData .py 的依赖,在这里找: https://github.com/fuzzywalls/ghidra_scripts

huahai111 commented 2 years ago

好的好的,十分感谢

---原始邮件--- 发件人: @.> 发送时间: 2022年5月14日(周六) 上午10:43 收件人: @.>; 抄送: @.**@.>; 主题: Re: [NSSL-SJTU/SaTC] totolink固件分析报错 (Issue #9)

CodatifyFixupData .py 的依赖,在这里找: https://github.com/fuzzywalls/ghidra_scripts

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

qijiale commented 2 months ago

作者您好@smile0304 ,您可以给出复现TOTOLink实验的具体命令是什么吗?