SigmaHQ / pySigma-backend-elasticsearch

pySigma Elasticsearch backend
GNU Lesser General Public License v3.0
42 stars 26 forks source link

DSL query support #49

Open balintnadasi opened 8 months ago

balintnadasi commented 8 months ago

Hi guys!

Is there any chance that this backend will support pure DSL query generation in the near future?

Mat0vu commented 5 months ago

Hi, I would also be glad to see a DSL-Backend. In the long term we consider switching to EQL or ES|QL, however at the moment we are still using the sigmac converter with some customizations and dsl output format. A DSL backend for pysigma would enable us to continue using aggregations/correlation queries while making it relatively easiy to compare the output from sigmac with the new output of pysigma to ensure that the searches are still working as expected. Additionally, as long as ES|QL is still in technical preview, we probably will avoid using this backend for productive use.

If you also still think that the dsl backend is a useful feature, I could offer to start working on a new DSL Backend, since I don´t see a dsl-branch in this repo and also not in the forks indicating somebody is already working on this. If you have already started I can also maybe help with testing :)

thomaspatzke commented 5 months ago

I didn't started a DSL backend and I don't know about anyone who started. So feel free 😉

balintnadasi commented 4 months ago

Hello @Mat0vu !

I saw that you forked the project and I would be happy to help with my unit tests. Where can I get the JSONQueryBackend?

Mat0vu commented 4 months ago

Hi @balintnadasi ,

sorry for the late response. I´ve just updated my fork where I´ve been working on the implementation of a DSL backend for Elasticsearch.

Since the DSL Language is using json-queries in contrast to EQL or ESQL and because it was difficult to get the desired output using the variables provided by the TextQueryBackend which then passes the data to the python str.format() function, I´ve decided to create a new class JsonQueryBackend which could have been included into the base.py class. However, in the end the code of my new JsonQueryBackend was almost identical to the TextQueryBackend with only a few adjustments. That´s why I´ve thought that it is probably a better way to switch back to TextQueryBackend. Now the DSLBackend is based on the TextQueryBackend again and overwrites some functions completely (especially to get working correlation rules), which I could not get to work with json and the various str.format() calls within the default superclass.

So far I´ve managed to implement (hopefully) most of the basic use cases:

Currently not supported (only the stuff I know of, so probably not complete):

I´m not a specialist regarding Elastic-Mappings, and all of the fields we are searching in are mapped as keyword fields, for which regex and term queries work well. However, searching in text-fields might require some changes to the search type (e.g. match-search)...

I would say the Backend is far from finished but the current status seemed to be working fine when translating some of our existing rules and comparing the hits with the rules that were translated with sigmac. Aggregations also seemed to be working fine.

I will not be able to continue working on this topic for the next few weeks and because ES|QL is going to be fully supported by Elatic >=8.14 we are currently considering to switch to the new language. Anyways, you are very welcome to add unit_tests and improve the code :) If I find time, I will also try to continue working on this, however this won´t be possible in the next few weeks...

andurin commented 1 week ago

@balintnadasi / @Mat0vu Is this still an issue and would you like to prepare a pull request for ES-DSL in the future or has EQL/ES-QL successful overwritten the need?

Mat0vu commented 1 week ago

Hi @andurin, because my team is currently switching to ESQL, we do not need DSL support anymore. If @balintnadasi or anyone else still wants DSL, they can use the code from here as a starting point.

balintnadasi commented 1 week ago

Hello @andurin !

I revised @Mat0vu 's code to get a query that approximates the old sigmac (unfortunately, regex filters seemed slower in some cases). For now, I’m facing some escaping issues, and I'm working on resolving them. If all goes well (and I will have some time before xmas), I hope to create the merge request in December.