els0r / goProbe

High-performance IP packet metadata aggregation and efficient storage and querying of flows
GNU General Public License v2.0
12 stars 4 forks source link

global-query #43

Closed els0r closed 1 year ago

els0r commented 1 year ago

Distributed querying for goQuery, aggregating results.Result structures. This allows to run queries and flow aggregations against a global fleet which has goProbe/goQuery deployed.

els0r commented 1 year ago

I will overwrite the branch from #43 and overwrite it. Too much changed under the hood. I've saved the relevant bits in global-query.

Way to go about this:

els0r commented 1 year ago

Will start progress next week

els0r commented 1 year ago

Saw https://github.com/els0r/goProbe/actions/runs/4613869362/jobs/8156300674, which seems unrelated to this issue, but still needs to be investigated.

The problem is a race condition in line 273 of the DBWorkManager

                    w.nWorkloadsProcessed++

That number is also accessed in line 256 with:

                logger.Infof("Query cancelled (workload %d / %d)...", w.nWorkloadsProcessed, w.nWorkloads)
els0r commented 1 year ago

@fako1024 : need help with testing the very first version of global-query. Would be glad if you could give this a spin on your sensors with:

Then, run queries with the tool with

go run cmd/global-query/main.go \
  --config cmd/global-query/local-config.yaml \
  --hosts.querier.config cmd/global-query/api-client-querier.yaml \
  -q <comma-separated-host-list> <goquery args>
fako1024 commented 1 year ago

Raised #94 and will take care of the race condition. As for the global query I'll gladly give it a shot today. :heart:

fako1024 commented 1 year ago

@els0r OK, first attempt: Deployment went fine, nodes are reachable. However there seems to be an issue with the encoding of the query (probably with the attributes). Whatever I do I get a HTTP 400 from both sensors:

On the caller (my laptop):

└─ $ ▶ ./global-query --config ./local-config.yaml --hosts.querier.config api-client-querier.yaml -q fw-1,fw-2 -i eth0 -n 10 talk_conv 
ts=2023-04-05T09:27:08Z level=info caller=cmd/root.go:219 msg="setting up queriers" app_name=global-query app_version=872abf81 hosts=fw-1,fw-2 query="{\"ifaces\":[\"eth0\"],\"query_type\":\"talk_conv\",\"attributes\":[{},{}],\"direction\":\"bi-directional\",\"from\":1678012028,\"to\":1680686828,\"format\":\"txt\",\"limit\":10,\"sort_by\":\"bytes\",\"dns_resolution\":{\"dns_timeout\":1000000000,\"max_rows\":25},\"db\":\"\"}"
ts=2023-04-05T09:27:08Z level=debug caller=hosts/query.go:124 msg="running query" app_name=global-query app_version=872abf81 hostname=fw-1
ts=2023-04-05T09:27:08Z level=info caller=client/client.go:132 msg="creating new request" app_name=global-query app_version=872abf81 hostname=fw-1 method=POST url=http://10.1.10.2:18081/api/v1/_query
ts=2023-04-05T09:27:08Z level=debug caller=hosts/query.go:124 msg="running query" app_name=global-query app_version=872abf81 hostname=fw-2
ts=2023-04-05T09:27:08Z level=info caller=client/client.go:132 msg="creating new request" app_name=global-query app_version=872abf81 hostname=fw-2 method=POST url=http://10.1.20.2:18081/api/v1/_query
ts=2023-04-05T09:27:08Z level=error caller=hosts/query.go:184 msg="failed to run query: 400 Bad Request" app_name=global-query app_version=872abf81 hostname=fw-1
ts=2023-04-05T09:27:08Z level=error caller=hosts/query.go:184 msg="failed to run query: 400 Bad Request" app_name=global-query app_version=872abf81 hostname=fw-2
Status "empty": query returned no results
Hosts with errors: 2

        #    host    status                                 message

        1    fw-1     error    failed to run query: 400 Bad Request
        2    fw-2     error    failed to run query: 400 Bad Request

On both sensors (same error each time, no matter what kind of query type I use):

2023-04-05T11:26:03.312+0200    error   errors/errors.go:34     failed to decode query statement: query.Statement.Attributes: []types.Attribute: decode non empty interface: can not unmarshal into nil, error found in #10 byte of ...|ributes":[{},{}],"di|..., bigger context ...|":["eth0"],"query_type":"talk_conv","attributes":[{},{}],"direction":"bi-directional","from":1678011|...{"app_name": "goProbe_alpine", "app_version": "872abf81"}

This seems fishy: attributes":[{},{}] ...

fako1024 commented 1 year ago

Sidenote: This error message probably needs some "love":

└─ $ ▶ ./global-query 
Failed to read in config: Config File ".cmd" Not Found in "[/home/fako]"

As in: If neither config nor command line parameters are specified complain about that specifically (and I assume the .cmd without prefix is also unintentional)...

els0r commented 1 year ago

@fako1024 : this thing is going places. You can test with latest changes and should be able to get a result.

Will still need polishing, as well as the introduction of the host attribute during printing. But it's a start. Have fun!

fako1024 commented 1 year ago

Alright, feedback from the next testing round, we're getting a lot closer:

goroutine 1 [running]: main.main() /tmp/goProbe/cmd/goProbe/goProbe.go:64 +0x16a8


- Performing the queries seems to work (as in, an output table is generated :partying_face: ), but according to the output it says that the query returned no results:

ts=2023-04-07T12:01:51Z level=info caller=cmd/root.go:218 msg="setting up queriers" app_name=global-query app_version=devel hosts=fw-1,fw-2 query="{\"query\":\"talk_conv\",\"ifaces\":\"eth0\",\"first\":\"Tue Mar 7 14:01:51 2023\",\"last\":\"Fri Apr 7 14:01:51 2023\",\"format\":\"txt\",\"sort_by\":\"bytes\",\"num_results\":10,\"dns_resolution\":{\"enabled\":false,\"timeout\":1000000000,\"max_rows\":25},\"max_mem_pct\":60}" ts=2023-04-07T12:01:51Z level=debug caller=hosts/query.go:124 msg="running query" app_name=global-query app_version=devel hostname=fw-1 ts=2023-04-07T12:01:51Z level=info caller=client/client.go:132 msg="creating new request" app_name=global-query app_version=devel hostname=fw-1 method=POST url=http://10.1.10.2:18081/api/v1/_query ts=2023-04-07T12:01:51Z level=debug caller=hosts/query.go:124 msg="running query" app_name=global-query app_version=devel hostname=fw-2 ts=2023-04-07T12:01:51Z level=info caller=client/client.go:132 msg="creating new request" app_name=global-query app_version=devel hostname=fw-2 method=POST url=http://10.1.20.2:18081/api/v1/_query

                                               packets   packets             bytes      bytes       
                 sip                     dip        in       out      %         in        out      %
  2a00:12e8:1:a:2::7  2a04:4540:8200:99::de2    1.63 M    2.64 M  37.36    1.72 GB    2.22 GB  49.83

2a01:4f8:191:31c1:3::3 2a04:4540:8200:99::de2 152.34 k 1.51 M 14.54 26.13 MB 1.79 GB 22.90 85.158.5.153 144.76.128.146 337.83 k 234.84 k 5.01 316.82 MB 20.75 MB 4.16 85.158.5.153 144.76.128.146 234.44 k 337.64 k 5.00 20.71 MB 316.16 MB 4.15 2a04:4540:8200:99::d08 2a01:4f8:191:31c1:3::3 28.65 k 204.15 k 2.03 4.72 MB 234.95 MB 2.95 2a04:4540:8200:99::d08 2a00:12e8:1:a:2::7 35.61 k 231.00 k 2.33 5.59 MB 214.39 MB 2.71 85.158.5.153 144.76.51.230 163.42 k 103.54 k 2.33 186.18 MB 8.42 MB 2.40 85.158.5.153 144.76.51.230 103.34 k 163.31 k 2.33 8.40 MB 185.79 MB 2.39 46.22.10.26 85.158.5.154 137.90 k 134.52 k 2.38 42.64 MB 66.16 MB 1.34 85.158.5.153 144.76.51.228 180.52 k 158.20 k 2.96 70.04 MB 17.77 MB 1.08 ... ... ... ...
4.35 M 7.09 M 2.60 GB 5.32 GB

             Totals:                                     11.44 M                      7.92 GB  

Timespan / Interface : [2023-04-05 16:04:41, 2023-04-06 08:45:13] / Sorted by : accumulated data volume (sent and received) Query stats : displayed top 10 hits out of 2.00 k in 94ms

Hosts with errors: 2

    #    host    status                      message

    1    fw-1     empty    query returned no results
    2    fw-2     empty    query returned no results

If I limit the query to one of both hosts I get a subset of the results, respectively, so it _clearly_ fetches data and the errors are a lie.

- I'd expect the table to show the hostname (or ID) to indicate where the flow / row came from.
- Last version showed a log message on the receiving goProbe sensor whenever an API call came in. Did that behavior change in this version? Having an info level message for each API call is probably a good idea (to know who queries my data and when)...
- Can you maybe rebase / merge current develop into the branch (it's still crashing from #99 , which makes testing a bit more complicated :stuck_out_tongue: )?
els0r commented 1 year ago

@fako1024 : next iteration ready. Have fun 😎

Will revise the request logging when switching API to gin. Agree with your observation.

fako1024 commented 1 year ago

Next test looks awesome alrady:

Only found that the -resolve flag doesn't do anything (no error, no nothing), but I guess that's unrelated, right? Do we need an issue to fix that for the release?

Kudos, this looks like an awesome first shot. Can't wait to take a look at the API for integration into other frameworks... :heart_eyes: !!!

Only remark: As for the HTTP client interaction here I can recommend a cool & well maintained package from some dude (nudge) that simplifies handling of the HTTP client and makes the code more readable (and might also simplify extending the client functionality later on because it already comes with a lot of features, even production-level tested client certificate handling and the likes). Just saying :wink:

This is it: https://github.com/fako1024/httpc

els0r commented 1 year ago

Thanks for the feedback. Glad to see the results looking consistent. As for the resolve queries, those need a separate investigation.

Some other news:

It doesn't end here. I'll move the single call part of global-query to goQuery. Why? Because that's the CLI tool we use to run queries. Since there's already a client for global-query. Means you don't have to duplicate much and learn to call a new tool. Only add one or two config parameters to goQuery and you're done. Will keep you posted.

fako1024 commented 1 year ago

Nice! Moving the query stuff to goQuery sounds like a good plan. Basically being able to perform local and global queries (which use the same syntax / logic anyway) with a single tool is probably for the best.

On a different note though: With the recent changes of yesterday / today the query tool stopped working (independent of the server mode). Doing the same queries as yesterday I now get nothing (literally, no error, no feedback, just a return code of 1, so something probably didn't work the way it should):

└─ $ ▶ ./global-query --config ./local-config.yaml --hosts.querier.config api-client-querier.yaml -q fw-1,fw-2 -i eth0 -n 10 talk_conv
fako @ fako-x1 /tmp/goProbe/cmd/global-query (43-global-query *)
└─ $ ▶ echo $?
1

Diving a little deeper I figured as much: In cmd/global-query/cmd/root.go error handling is incomplete (the error is never shown and we just exit):

func Execute() {
        err := rootCmd.Execute()
        if err != nil {
                os.Exit(3)
        }
}

Throwing the error at that place at least tells me what's going on:

unknown command "talk_conv" for "global-query"

Could it be that the cobra command logic has a conflict between the flags / arguments of global-query and goQuery now that you're trying to handle both in one binary?

Just running in server mode seems to work:

└─ $ ▶ ./global-query --config ./local-config.yaml --hosts.querier.config api-client-querier.yaml  server
[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /_query                   --> github.com/els0r/goProbe/pkg/global-query/server.(*Server).postQuery-fm (2 handlers)
ts=2023-04-09T10:25:42Z level=info caller=cmd/server.go:72 msg="starting API server" app_name=global-query app_version=devel addr=localhost:8145
fako1024 commented 1 year ago

Also maybe a more fundamental question: What exactly is it that I can POST to the endpoint? I figured it should be something along the lines of query.Args, but no matter what I POST to /_query I get nothing (again, no error, no feedback) so it's a bit hard for me to test this properly. Can you maybe provide me with some pointers?

els0r commented 1 year ago

Interesting. Thanks for the feedback. Then, the latest commit messed up something.

The POST should get the query args and then return the result.

I'll look at it in the context of moving all the code in entrypoint to the appropriate routine in goQuery (which does the call using the global-query client).

Stay tuned

els0r commented 1 year ago

Thanks for all the testing so far!!!

As for proper feedback/error messages: that still needs a bit of love on the client side (I may need your help with regards to how httpc is best used for that).

The recent commit can somewhat be called MVP for the entire query system. It should work on your machine. Invocation via goQuery works as follows:

./goQuery --query.server.addr <host>:<port> -i eth0 -f -5m -n 10 sip,dip -q fw-1,fw-2

or

./goQuery --config goquery-conf.yaml -i eth0 -f -5m -n 10 sip,dip -q a,b

The query server is started with

./global-query --config global-query-conf.yaml server --server.addr localhost:8888

Naturally, this system needs quite some testing in https://github.com/els0r/goProbe/issues/74 to make sure there are no known regressions/edge cases that aren't covered.

@fako1024 : if you can confirm that it's now working in your infrastructure, I'll open the PR.

fako1024 commented 1 year ago

Alrighty, got it to run. Some feedback:

goroutine 1 [running]: github.com/els0r/goProbe/pkg/global-query/api/client.(Client).Query.func1(0xe57780?, {0xc000094200?, 0xa1666e?}) /tmp/goProbe/pkg/global-query/api/client/client.go:84 github.com/fako1024/httpc.(Request).RunWithContext(0xc000094100, {0xaed440, 0xc0001b7580}) /home/fako/Develop/go/pkg/mod/github.com/fako1024/httpc@v1.0.13/httpc.go:403 +0xb91 github.com/els0r/goProbe/pkg/global-query/api/client.(Client).Query(0xc00007c040, {0xaed440, 0xc0001b7580}, 0x1?) /tmp/goProbe/pkg/global-query/api/client/client.go:91 +0x285 github.com/els0r/goProbe/pkg/global-query/api/client.(Client).Run(0xc0001c1440?, {0xaed440?, 0xc0001b7580?}, 0x0?) /tmp/goProbe/pkg/global-query/api/client/client.go:70 +0x25 github.com/els0r/goProbe/cmd/goQuery/commands.entrypoint(0xe0d980, {0xc0001be2a0, 0x1, 0xe}) /tmp/goProbe/cmd/goQuery/commands/root.go:269 +0xb02 github.com/spf13/cobra.(Command).execute(0xe0d980, {0xc0001a6010, 0xe, 0xe}) /home/fako/Develop/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:872 +0x694 github.com/spf13/cobra.(Command).ExecuteC(0xe0d980) /home/fako/Develop/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:990 +0x3bd github.com/spf13/cobra.(*Command).Execute(...) /home/fako/Develop/go/pkg/mod/github.com/spf13/cobra@v1.5.0/command.go:918 github.com/els0r/goProbe/cmd/goQuery/commands.Execute() /tmp/goProbe/cmd/goQuery/commands/root.go:47 +0x25 main.main() /tmp/goProbe/cmd/goQuery/main.go:6 +0x17


I think the reason is that you did specify a function, but no the intervals. I'll quickly check if that's something that needs handling in `httpc` (at least an error) or if I'm mistaken...

One more, unrelated thing: Is the Go API for the server abstract enough already so that it can be integrated into other tools (use case: I happen to have a central control server already and don't want to deploy another binary / independent microservice - instead I just want to integrate the global query API into my existing tool)?
els0r commented 1 year ago

Thx! Will have a look at the open points.

As for your question if the Go API is abstract enough: I would say so - as long as you use gin-gonic as the API router 🤓. What I could imagine is to make the server.postQuery registration public.

fako1024 commented 1 year ago

Thx! Will have a look at the open points.

As for your question if the Go API is abstract enough: I would say so - as long as you use gin-gonic as the API router nerd_face. What I could imagine is to make the server.postQuery registration public.

That would of course be an interesting addition, right. But I was more referring to a "layer below". Let's assume the following scenario: I detect an interesting IP somewhere and now want to figure out if that IP appeared somewhere globally. I should be able to programmatically construct the query.Args in my Go code and then perform a query just like the server / goQuery does (after all, that's how you integrated it into goQuery). Question is only: How much work do I need to do?

els0r commented 1 year ago

Answer would be: not much work 😄 . As you said, you can create the query args programmatically and then run it using the global-query client's Run (or Query) method. That's it. Job done.

I'm assuming, the whole thing would use JSON under the hood, since you are not planning to display text to a user, but further process the information retrieved.

If you do want to display it to a user, take note of the following code inside cmd/goQuery/commands/root.go:

        // make sure that the hostname is present in the query type (and therefore output)
        // The assumption being that a human will have better knowledge
        // of hostnames than of their ID counterparts
        if queryArgs.Format == "txt" {
            if !strings.Contains(queryArgs.Query, types.HostnameName) {
                queryArgs.Query += "," + types.HostnameName
            }
        }

That's just adding the label hostname to the output in case a human operator is looking at the results.