eunomia-bpf / GPTtrace

Generate eBPF programs and tracing with ChatGPT
https://eunomia.dev/GPTtrace/
MIT License
217 stars 21 forks source link

[feature] use llama and langchain to rewrite GPTtrace #5

Closed try-agaaain closed 1 year ago

try-agaaain commented 1 year ago

Referring to https://github.com/jerryjliu/llama_index/blob/main/examples/langchain_demo/LangchainDemo.ipynb, I have rewritten GPTtrace so that the program can have a continuous dialogue: when the gpt-generated command execution fails, it can give feedback to gpt with an error message and ask gpt to rewrite the generated command (but in practice, gpt sometimes does not return a command directly as requested by prompt, but appends some information to the command).

llama will process the document into a vector database, which is saved in the train_data.json file. When executing GPTtrace, if we specify "-t" or "--train", gpt may or may not query the vector database, because here llama acts as a tool for langchain, and if gpt don't think it needs to use this tool, it will not query the vector database.

try-agaaain commented 1 year ago

Sometimes the information in the documentation can interfere with the gpt answer, for example, when I want to write an ebpf program about "Syscall count by program", the most relevant information will be obtained from the documentation as follows:

There are some related information about this query:
Info 0:
Syscall count by program
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
Info 1:
Read size distribution by process:
bpftrace -e 'tracepoint:syscalls:sys_exit_read { @[comm] = hist(args->ret); }'
Info 2: 
Show per-second syscall rates:
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @ = count(); } interval:s:1 { print(@); clear(@); }'

The second message is most relevant to "Syscall count by program", so gpt returns: bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }', which is a bpftrace command and not an ebpf program.