Open aleksmaus opened 7 months ago
Assigning this to myself for now. Will dig some more today.
Changing "numeric" strings to "text" in the input config eliminates the issue:
Input:
{
"inputs": [
{
"osquery": {
"packs": {
"a1233": {
"queries": {
"test": {
"interval": 60,
"query": "select * from uptime",
"removed": false,
"snapshot": true,
"timeout": 60
}
}
},
"test": {
"queries": {
"a12312": {
"interval": 60,
"query": "select * from uptime",
"timeout": 60
}
},
"shard": 100
}
}
}
}
]
}
Output:
{
"inputs": [
{
"osquery": {
"packs": {
"a1233": {
"queries": {
"test": {
"interval": 60,
"query": "select * from uptime",
"removed": false,
"snapshot": true,
"timeout": 60
}
}
},
"test": {
"queries": {
"a12312": {
"interval": 60,
"query": "select * from uptime",
"timeout": 60
}
},
"shard": 100
}
}
}
}
]
}
Thank you @aleksmaus 🙇
Abbreviated repro case with go-ucfg
func TestConfigBug(t *testing.T) {
var m map[string]interface{}
_ = json.Unmarshal([]byte(testConfig), &m)
cfg, _ := ucfg.NewFrom(m)
c := newConfigFrom(cfg)
m, _ = c.ToMapStr()
s, _ := json.Marshal(m)
fmt.Printf("RESULT:%s\n", string(s))
}
const testConfig = `{"a":{"12":{}}}`
Output (the length of the array is 13 == 12+1):
RESULT:{"a":[null,null,null,null,null,null,null,null,null,null,null,null,null]}
Not super familiar with go-ucfg
, but the field name is parsed into number here
https://github.com/elastic/go-ucfg/blob/main/path.go#L79
and here is where the array idx+1
is created
https://github.com/elastic/go-ucfg/blob/main/ucfg.go#L295
Another unrelated caveat
func TestConfigBug(t *testing.T) {
var m map[string]interface{}
_ = json.Unmarshal([]byte(testConfig), &m)
cfg, _ := ucfg.NewFrom(m)
c := newConfigFrom(cfg)
m, _ = c.ToMapStr()
s, _ := json.Marshal(m)
fmt.Printf("RESULT:%s\n", string(s))
}
const testConfig = `{"a":{"b12":{}}}`
Results in the output:
RESULT:{"a":{"b12":null}}
Which is not quite correct, since the it was "b12":{}
(empty object) in the input document.
the worse case
INPUT:
{"a":{"9223372036854775807":{}}}
RESULT:
panic: runtime error: makeslice: len out of range [recovered]
panic: runtime error: makeslice: len out of range
INPUT:
{"a":{"9223372036854":{}}}
RESULT:
runtime: out of memory: cannot allocate 147573956411392-byte block (3833856 in use)
fatal error: out of memory
Made some changed in go-ucfg lib in order to address the issues above, PR is open for review.
Testing the change further with the agent and beats. Another place where it breaks is on the beats configuration parsing side: after the agent picked up the "workaround" and sends the correct configuration to the beats, the beats configuration parsing code breaks until go-ucfg and elastic-agent-libs changes are picked up.
Beats generates the config here: https://github.com/elastic/beats/blob/main/x-pack/libbeat/management/generate.go#L156
using elastic-agent-libs
conf.NewConfigFrom
https://github.com/elastic/elastic-agent-libs/blob/473983911d7c78e57bf30af91f66f5bca50d0a59/config/config.go#L81
where the config options are hardcoded
https://github.com/elastic/elastic-agent-libs/blob/473983911d7c78e57bf30af91f66f5bca50d0a59/config/config.go#L44
So it looks like the final scope of the update, if we start with go-ucfg, would include:
After applying the change to elastic-agent-libs
to call the newly added ucfg.EnableNumKeys(true)
workaround API on go-ucfg, and recompiling the agent and beats with these changes, confirmed OS query is working with numeric query ID.
The incoming configuration is correct:
The results from the scheduled query are posted as expected:
Another possibly faster fix for osquery specific issue with much smaller change surface it to restrict users from assigning the numeric strings to the query ids. @tomsonpl opened a draft PR here https://github.com/elastic/kibana/pull/176507
Still, any place that allows the numeric keys to be inserted into the policy will introduce this kind of issue. Another possible place for example is the oquery advanced configuration section that allows users to specify the free form JSON configuration for osquery.
Another possible approach is to rewrite how incoming map[string]interface{} is parsed into Config so we don't have to work around/fix the current ucfg implementation and possibly break some other users who count on the current behavior, or maybe ditch the ucfg Config altogether. Need some feedback from the agent/beats team here on all the possible consequences of making this change.
@cmacknz @blakerouse @andrewkroh
ucfg is still used widely enough in our own code (both beats and agent) that fixing the problem there feels correct. Getting rid of uses of go-ucfg in one place doesn't help the other uses of it.
I can't imagine anyone is depending on this currently completely broken behavior.
PR against go-ucfg is open https://github.com/elastic/go-ucfg/pull/198. I don't have permissions to set reviewers or any labels there though. I noticed @fearful-symmetry also had another PR opened a bit earlier that fixes/"works around" another oddity in that same library. The suggestion was to bring this up during the next agent meeting.
The PR is merged. Need to update the Agent/beats/libs to pick up the version with this change.
Tested with the latest Agent 8.13. This is first uncovered with osquery when the users tried to use numbers as query id https://github.com/elastic/kibana/issues/175421
The incoming policy is correct. Narrowed it down to to
c, err := config.NewConfigFrom(action.Policy)
in PolicyChangeHandler https://github.com/elastic/elastic-agent/blob/main/internal/pkg/agent/application/actions/handlers/handler_action_policy_change.go#L101Repro code, for example, with JSON extracted from the policy
Result (truncated) looks like this:
Full result is attached, since it doesn't fit into description result.json