[feature] use llama and langchain to rewrite GPTtrace

eunomia-bpf / GPTtrace

Generate eBPF programs and tracing with ChatGPT

MIT License

225 stars 21 forks source link

Referring to https://github.com/jerryjliu/llama_index/blob/main/examples/langchain_demo/LangchainDemo.ipynb, I have rewritten GPTtrace so that the program can have a continuous dialogue: when the gpt-generated command execution fails, it can give feedback to gpt with an error message and ask gpt to rewrite the generated command (but in practice, gpt sometimes does not return a command directly as requested by prompt, but appends some information to the command).

llama will process the document into a vector database, which is saved in the train_data.json file. When executing GPTtrace, if we specify "-t" or "--train", gpt may or may not query the vector database, because here llama acts as a tool for langchain, and if gpt don't think it needs to use this tool, it will not query the vector database.

Sometimes the information in the documentation can interfere with the gpt answer, for example, when I want to write an ebpf program about "Syscall count by program", the most relevant information will be obtained from the documentation as follows:

There are some related information about this query:
Info 0:
Syscall count by program
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
Info 1:
Read size distribution by process:
bpftrace -e 'tracepoint:syscalls:sys_exit_read { @[comm] = hist(args->ret); }'
Info 2: 
Show per-second syscall rates:
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @ = count(); } interval:s:1 { print(@); clear(@); }'

The second message is most relevant to "Syscall count by program", so gpt returns: bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }', which is a bpftrace command and not an ebpf program.

eunomia-bpf / GPTtrace

[feature] use llama and langchain to rewrite GPTtrace #5