falcosecurity / evolution

Evolution process of The Falco Project
Apache License 2.0
47 stars 37 forks source link

Transfer of ownership for falco-gpt (sandbox) #311

Closed Dentrax closed 5 months ago

Dentrax commented 10 months ago

As you might see from the announcements ^1 that I've also introduced it in the community meeting ^3, I was created a new PoC tool called falco-gpt.

falco-gpt is an OpenAI powered tool to generate remediation actions for Falco audit events. It is a simple HTTP server that listens for Falco audit events and pushes them to an internal NATS server acting like a queue. The queue is then processed by a goroutine that sends the audit events to OpenAI API by applying rate limiting and retries. The generated remediation actions are then sent to Slack via a BOT in a thread. ^4

Repository: https://github.com/Dentrax/falco-gpt

Motivation

By transferring of the ownership of the project to Falco ecosystem, it would allow to project grow faster, much more efficiently and organized since I don't find much free time to maintain. By taking the advantage of the great community and maintainers, I believe we'd do much better.

Next steps would be: (some of my ideas)

Waiting for your feedback.

leogr commented 10 months ago

Hey @Dentrax

I would love to see your project under the Falcosecurity organization :star_struck: And thank you for considering donating your project to this organization!

So big +1 from me!

Just one question: I see the falco-gpt is MIT-licensed. Would it be ok for you to switch the license to Apache2? As per the CNCF IP Policy (section 11) all contributions must be made under Apache2, whereas third-party dependencies can match a more comprehensive set of allowed licenses

cc @falcosecurity/core-maintainers

Issif commented 10 months ago

+1 for me too.

@Dentrax Do you agree I become owner with you? I think, I'm the one who knows the better the code base with you.

jasondellaluce commented 10 months ago

+1

FedeDP commented 10 months ago

:+1: from me! Bleeding edge technologies melt together :rocket:

Andreagit97 commented 10 months ago

amazing work, thanks +1!

incertum commented 10 months ago

Hi @Dentrax thank you for all the hard work!

While privacy concerns are clearly stated in the current repos readme https://github.com/Dentrax/falco-gpt#disclaimer, I have concerns about sponsoring a tool under The Falco Project that suggests sending sensitive real-life production data to OpenAI. This is likely to go against the privacy policies of most adopters.

Therefore, my vote is a conditional +1 if we were to make significant adjustments to the project.

Instead of recommending making calls against the OpenAI API with real data, why don't we explore how far we can get by feeding synthetic data from our existing e2e tests? Do OpenAI's recommendations actually depend on the data fields, or do they only depend on the rule names or descriptions, given that it is a generic LLM?

The project could benefit from a clearer motivation and justification for the methodology chosen, as well as an expansion of its use cases and examples.

I would require at least a best effort attempt to perform quality control and model validation. For example, each existing upstream rule should be tested multiple times, and the incident response actions suggested by OpenAI should be deemed at least somewhat valid for real-life incident response actions. This is crucial because by promoting this project, we are indirectly approving its validity, even though OpenAI clearly states that data can be wrong.

Lastly, the term "AI + Falco" seems too far-fetched at the moment, as it could be misunderstood to mean that the Falco runtime tool now uses AI to generate detections. I would hold off on using this messaging until we actually do something like that.

Dentrax commented 10 months ago

Hey, thanks for the interest everyone!

Would it be ok for you to switch the license to Apache2? - @leogr

Sure, I just updated the license.

Do you agree I become owner with you? - @Issif

Definitely! It'd be great to collaborate since you are already familiar with the code base.

While privacy concerns are clearly stated in the current repos readme Dentrax/falco-gpt#disclaimer, I have concerns about sponsoring a tool under The Falco Project that suggests sending sensitive real-life production data to OpenAI. This is likely to go against the privacy policies of most adopters. - @incertum

Thanks for the reviews, Melissa. I didn't think of it from that point of view. This is really nice. I'd like to clarify you concerns as much as I can.

Instead of recommending making calls against the OpenAI API with real data, why don't we explore how far we can get by feeding synthetic data from our existing e2e tests?

It does make sense. We could create a big example-audit-log list to feed OpenAI in order to prevent sending real audit data. This can be enabled with a flag. But we should think carefully about how this dummy data will fit in with the real scenario.

Do OpenAI's recommendations actually depend on the data fields, or do they only depend on the rule names or descriptions, given that it is a generic LLM?

I'm not really sure. This would require a technical knowledge about how ChatGPT works under the hood. Basically, ChatGPT uses/puts that fields in the final output message to enrich the recommendation. Do you mean we should redact that?

I would require at least a best effort attempt to perform quality control and model validation.

Ah, yes. OpenAI could be wrong sometimes. Means that, the accurate of this project is as accurate as the how OpenAI is accurate.

For example, each existing upstream rule should be tested multiple times, and the incident response actions suggested by OpenAI should be deemed at least somewhat valid for real-life incident response actions. This is crucial because by promoting this project, we are indirectly approving its validity, even though OpenAI clearly states that data can be wrong.

This would be a challenging. Covering with the unit tests could also be misleading since responses of ChatGPT have temperature and leads inconsistencies after each run. (even you set it to 0) Maybe we should write a "Risks & Mitigations" section in the README to indicate OpenAI sometimes could be wrong, and do not trust it. TBH, I have no idea about how should we tackle with that.

incertum commented 10 months ago

could create a big example-audit-log list to feed OpenAI

Yes, this would be great to get started. Before making changes and deciding on flags and other details, let's first experiment and see what we can find.

I'm not really sure. This would require a technical knowledge about how ChatGPT works under the hood. Basically, ChatGPT uses/puts that fields in the final output message to enrich the recommendation. Do you mean we should redact that?

We absolutely need to perform black-box testing, similar to how you find exploits and such. This means feeding in all sorts of example logs, from synthetic or complete logs to redacted ones. Afterwards, we need to manually inspect the answers and assess how useful the suggestions are, especially because it is a generic LLM and not particularly trained for IR and Falco purposes.

Ah, yes. OpenAI could be wrong sometimes. Means that, the accurate of this project is as accurate as the how OpenAI is accurate.

Related to the comment above, as project maintainers, we at least need to provide recommendations on how useful the model OpenAI outputs are at the moment. We also need to keep this guidance up-to-date.

Follow-up question: Have you been exposed to working on large incidents? I'd be happy to help with the assessment, as I've been around the block a bit in this regard.

Covering with the unit tests could also be misleading since responses of ChatGPT have temperature and leads inconsistencies after each run.

Happy to clarify. This comment was referring to manual assessment by experts after checking the results of our existing Falco rules e2e and unit tests.

In summary, my proposed next steps are:

Test some Falco inputs, share IR suggestions by OpenAI, then define next steps.

maxgio92 commented 10 months ago

Love it @Dentrax! Thank you.

Thanks also @incertum for all the proposed detaild points to focus on. I agree with this:

Instead of recommending making calls against the OpenAI API with real data, why don't we explore how far we can get by feeding synthetic data from our existing e2e tests? Do OpenAI's recommendations actually depend on the data fields, or do they only depend on the rule names or descriptions, given that it is a generic LLM?

About this @incertum:

I would require at least a best effort attempt to perform quality control and model validation. For example, each existing upstream rule should be tested multiple times, and the incident response actions suggested by OpenAI should be deemed at least somewhat valid for real-life incident response actions. This is crucial because by promoting this project, we are indirectly approving its validity, even though OpenAI clearly states that data can be wrong.

I think it's important but maybe might not be required in the sandbox maturity level and in order to highlight the lack of all-official-core-rules testing, we could highlight that it's an experimental project and is not supposed to be used in production right now. In the meantime the validation could go on. What do you think?

I agree with the proposed next steps:

Test some Falco inputs, share IR suggestions by OpenAI, then define next steps.

leogr commented 10 months ago

I think it's important but maybe might not be required in the sandbox maturity level and in order to highlight the lack of all-official-core-rules testing, we could highlight that it's an experimental project and is not supposed to be used in production right now. In the meantime the validation could go on. What do you think?

I agree :+1:

leogr commented 10 months ago

@incertum, do you still have any concerns regarding accepting this sandbox request?

incertum commented 10 months ago

@incertum, do you still have any concerns regarding accepting this sandbox request?

@leogr yes my previous guidance and conditional +1 remains valid https://github.com/falcosecurity/evolution/issues/311#issuecomment-1707390167.

May I kindly ask what the challenges are regarding testing at least let's say 20-30 rules? Is no one else interested in at least verifying whether the IR suggestions are even remotely useful?

poiana commented 7 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

poiana commented 6 months ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

poiana commented 5 months ago

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community. /close

poiana commented 5 months ago

@poiana: Closing this issue.

In response to [this](https://github.com/falcosecurity/evolution/issues/311#issuecomment-1951367232): >Rotten issues close after 30d of inactivity. > >Reopen the issue with `/reopen`. > >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Provide feedback via https://github.com/falcosecurity/community. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.