LLM for template writing

aronmolnar commented 1 year ago

Writing finding templates is sometimes a cumbersome task. It might be simplified by integrating an LLM and to add auto completion for finding templates.

Due to privacy requirements, this feature should rather be implemented for finding templates only (or at least disabled in projects by default).

It could also be integrated to the report designer.

However, existing costs might be difficult to calculate and there might also be certain risks (like becoming uncompliant in certain countries, or copyright issues by AI generated texts). Therefore, I'd provide the integration, but users must bring their own LLM tokens (probably Azure/ChatGPT).

aronmolnar commented 1 year ago

Hi @TheZ3ro, You downvoted this issue. Can you explain what speaks against it?

TheZ3ro commented 1 year ago

Hi @aronmolnar, Thanks for asking my opinion, I will leave my humble 2 cent here 😄 :

I pretty much agree with the issues you already described (privacy issues, compliance issues, cost issues).

I don't like LLMs used in this context because they are not 100% reliable and are prone to AI hallucination (https://www.forbes.com/sites/mollybohannon/2023/06/08/lawyer-used-chatgpt-in-court-and-cited-fake-cases-a-judge-is-considering-sanctions/). Immagine creating a finding template for SQL injections and the AI filling it with bogus CVEs/CWEs information that the pentester/reporter will not check and will end up into the final deliverable.

Moreover, reports are the only product of a pentester's work that gets visibility to the final customer. I think that writing reports is still part of the pentester work. It is nice to speed up the process by avoiding writing whole paragraphs and re-use text, but using AI for it is a "No" on my side.

aronmolnar commented 1 year ago

Thanks for your explanation. Yeah, there are issues with reliability and hallucinations, absolutely.

The same issues exist for example when coding with AI support (like GitHub co-pilot). We would see the application for it rather for finding templates, which need to be reviewed and reworked by pentesters in any case.

Lednerb commented 1 year ago

We have done some experiments while writing our latest whitepaper regarding our performed IT-Security study that will be released soon, using ChatGPT in the latest pro-plan version with the following conclusion:

LLMs are good tools for getting inspired/started or writing the boring stuff like introductions or entry-level information about some concepts. However, as soon as it gets to data analytics and more complex topics (and that is always the case with pentest reports - at least on our side), it seems to be more work and time effort in checking and correcting the output rather than writing the needed text about the identified security risks or vulnerabilities by yourself.

A possibly valid scenario on the other hand can be translating a self-written finding in native language to foreign languages with a higher quality of speech.

aronmolnar commented 1 year ago

We have done some experiments while writing our latest whitepaper regarding our performed IT-Security study that will be released soon, using ChatGPT in the latest pro-plan version with the following conclusion:

LLMs are good tools for getting inspired/started or writing the boring stuff like introductions or entry-level information about some concepts. However, as soon as it gets to data analytics and more complex topics (and that is always the case with pentest reports - at least on our side), it seems to be more work and time effort in checking and correcting the output rather than writing the needed text about the identified security risks or vulnerabilities by yourself.

A possibly valid scenario on the other hand can be translating a self-written finding in native language to foreign languages with a higher quality of speech.

Thank you @Lednerb for sharing your experiences.

We also made a PoC and can confirm exactly what you describe. Thus we believe that the available LLMs have not reached the maturity level we would expect.

Also, we found that GPT-4 is too slow for autocompletion (which we would like to provide).

Thus we are closing this issue for the moment.

Syslifters / sysreptor

LLM for template writing #121