JusticeRage / Gepetto

IDA plugin which queries uses language models to speed up reverse-engineering
GNU General Public License v3.0
2.81k stars 260 forks source link

The JSON document returned is invalid. Asking the model to fix it... #15

Closed lyustra closed 1 year ago

lyustra commented 1 year ago

Seems to happen on larger functions, and it never seems to resolve the issue.

Maybe it should stop querying after some time ?

JusticeRage commented 1 year ago

Thanks for this report, I agree that the plugin should not try indefinitely. Since this is a little tricky to reproduce, would you be able to test a fix I wrote?

Please replace the whole rename_callback function in the script with this one, and let me know if this works. Normally, the plug-in should give up after 3 tries. Alternately, you can also send me your IDB so I can try on my own.

def rename_callback(address, view, response, retries=0):
    """
    Callback that extracts a JSON array of old names and new names from the
    response and sets them in the pseudocode.
    :param address: The address of the function to work on
    :param view: A handle to the decompiler window
    :param response: The response from davinci-003
    :param retries: The number of times that we received invalid JSON
    """
    j = re.search(r"\{[^}]*?\}", response)
    if not j:
        if retries >= 3:  # Give up obtaining the JSON after 3 times.
            print("Could not obtain valid data from ChatGPT, giving up. Dumping the response for manual import:")
            print(response)
            return
        print(f"Cannot extract valid JSON from the response. Asking the model to fix it...")
        query_model_async("The JSON document provided in this response is invalid. Can you fix it?\n" + response,
                          functools.partial(rename_callback,
                                            address=idaapi.get_screen_ea(),
                                            view=view,
                                            retries=retries + 1))
        return
    try:
        names = json.loads(j.group(0))
    except json.decoder.JSONDecodeError:
        if retries >= 3:  # Give up fixing the JSON after 3 times.
            print("Could not obtain valid data from ChatGPT, giving up. Dumping the response for manual import:")
            print(response)
            return
        print(f"The JSON document returned is invalid. Asking the model to fix it...")
        query_model_async("Please fix the following JSON document:\n" + j.group(0),
                          functools.partial(rename_callback,
                                            address=idaapi.get_screen_ea(),
                                            view=view,
                                            retries=retries + 1))
        return

    # The rename function needs the start address of the function
    function_addr = idaapi.get_func(address).start_ea

    replaced = []
    for n in names:
        if ida_hexrays.rename_lvar(function_addr, n, names[n]):
            replaced.append(n)

    # Update possible names left in the function comment
    comment = idc.get_func_cmt(address, 0)
    if comment and len(replaced) > 0:
        for n in replaced:
            comment = re.sub(r'\b%s\b' % n, names[n], comment)
        idc.set_func_cmt(address, comment, 0)

    # Refresh the window to show the new names
    if view:
        view.refresh_view(True)
    print(f"davinci-003 query finished! {len(replaced)} variable(s) renamed.")
lyustra commented 1 year ago

Could you post the whole file including this change? I seem to mess it up while replacing that function.

gepetto.py: invalid syntax (gepetto.py, line 188) Traceback (most recent call last): File "ida_idaapi.py", line 615, in IDAPython_ExecScript code = compile(raw.decode(encoding), path, 'exec') File "gepetto.py", line 188 print(f"davinci-003 query finished! {len(replaced)} variable(s) renamed.") """ Callback that extracts a JSON array of old names and new names from the response and sets them in the pseudocode. :param address: The address of the function to work on :param view: A handle to the decompiler window :param response: The response from davinci-003 """