Closed simonvik closed 7 years ago
Interesting, I'll take a look and see what I can do to duplicate it.
One thing: could you try grabbing the binary from the latest release at https://github.com/ctdk/goiardi/releases/tag/v0.10.4 and see if it still happens? I ask because I built those binaries with go 1.6 instead of 1.5.1, and there was a fix that might be relevant that was added in either 1.5.2 or 1.5.3.
Also, is the data bag in question particularly large? It might help with figuring out what's happening.
Thanks!
No difference using 0.10.4.
Seems to crash after a POST to /data with the content of :
{"name":"apt_repositories","json_class":"Chef::DataBag","chef_type":"data_bag"}
The strange thing is that i cant recreate it using knife, can it be some internal state?
The request from tcpdump looks like:
POST /data HTTP/1.1
Content-Type: application/json
Accept: application/json
Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3
X-Ops-Server-Api-Version: 1
X-Ops-Sign: algorithm=sha1;version=1.1;
X-Ops-Userid: admin
X-Ops-Timestamp: 2016-03-24T20:25:01Z
X-Ops-Content-Hash: Si+4GbVP/13L7DLPsr0G2DSZTKw=
X-Ops-Authorization-1: QEKxbJEXkNK1ucRtLfCHj3z6ja3zspJ2QME6eJSMXTx9YhTAcIIPnoBLSgtq
X-Ops-Authorization-2: Ade86aNnUnDw3Mva4ATZuddl6r8tuvihNsvcRxzOtts7B56NUErvmarL5w6B
X-Ops-Authorization-3: o5MfkFJz1vMKhQltGkN26orMkyoTaDXzb+dukAUti2XHeqRO5DCXYETHpvBy
X-Ops-Authorization-4: Bc/P/qY56hhbW8V9gx+FmX94znx88Z1hygkUObXs4b8tnf7DHp/+BWn8Vl8l
X-Ops-Authorization-5: fwTQ54X8hVE9X6bez3Qt/rWBN84FhO1z42m7Za9J4KPDWoCwSD9tYAPtCV7u
X-Ops-Authorization-6: H6iW3qPl2j0ohvaXyM14m8upR7Xvug713iFXowqTww==
Host: 127.0.0.1:4545
X-Remote-Request-Id: 1185bf17-1d48-471f-9313-4db2d27dd431
Content-Length: 79
X-Chef-Version: 12.8.1
User-Agent: Chef Knife/12.8.1 (ruby-2.1.6-p336; ohai-8.11.1; x86_64-linux; +https://chef.io)
Connection: close
{"name":"apt_repositories","json_class":"Chef::DataBag","chef_type":"data_bag"}
I do not get a response after this.
(Haven't forgotten about this, I've just been super busy with some work stuff.)
I haven't been able to duplicate this with knife at all, so it looks like running_sushi is necessary to trigger it. That's the next thing I'm going to try.
I finally had a chance to come back to this, such a long time later (combination of being busy and not feeling well), but I still can't duplicate this. Posting raw data bags (although that one you suggested there fails because it doesn't include an "id" no matter how I try it), knife, and running_sushi all work just fine.
I do seem to be running an older chefdk still, so I'll try upgrading that and see if it makes a difference. Otherwise, I won't be able to make much progress with this without some more clues I'm afraid.
Tested this again, It does not seem to be related to data-bags, but the amount of objects.
The stack error seems a bit moved too.
I'm uploading about 120 cookbooks, 200 roles before it dies
Great, that'll help a bunch. Thanks for taking a look at that again, I'll get to work on it.
One small question: are the data bags in question encrypted by any chance, or do they have very large values inside of them (specifically like "foo": "barbaz.....", where "barbaz...." is some very long string)?
not encrypted and very small databags and i able to crash it without uploading any databags at all. Im uploading the stuff pretty fast with running_sushi.
Thanks, just trying to duplicate it again. (I know it wasn't limited to data bags, that was just what I was trying first.)
One more question: how much RAM does the VM or machine you're running goiardi in have available?
At least 10G and was running inside a systemd-nspawn as root on a ubuntu userspace.
Hmm, I got it to crash with an out of memory error, but that wasn't right. I think I have a pretty good idea what it is finally, but I still have yet to actually get the stack overflow to happen.
Can you tell me anything about the values inside these data bags and roles? I remember it being private before, but I have a feeling that some particular kind of string is able to trip up the indexer and send it into an infinite loop.
I don't think I've asked this yet - is there any unicode in these data bags or roles, or weird binary data? (The latter I'm not sure would work in json, but I'm just trying to cover some, if not all, bases.)
We have two databags with utf8-chars "ø" but no roles and it dies even if i comment out the delivery of databags.
The roles contain loads of different things, think it will be hard for me to find any exact thing there. Will try to see if i can remove items until it doesn't crash anymore.
So, found what crashes it, not related to anything i wrote above!
The following role crashes it, when you have multiple keys, dont know if the emptines matters with the same value in an array.
{
"name": "test",
"description": "test",
"json_class": "Chef::Role",
"default_attributes": {
"test" : [
"this",
"",
"is",
"",
"a",
"",
"test"
]
}
}
Very unsure why we had this but anyhow.
Interesting! Thank you for that - I had been trying to trip it up with weird unicode chars and unusually long values, but I hadn't thought to try that. I think I know what the problem is (it's close to what I thought it was, as a problem with the trie index, but not quite what I had in mind before).
I believe I have a fix for this problem (finally). Can you build the sushi_crash branch (https://github.com/ctdk/goiardi/tree/sushi_crash) and try uploading a role that normally makes it crash? I was finally able to duplicate it with arrays with duplicate items, and the above made it work.
Assuming it works well for you, I'll do some cleanup and improvements and make a formal release with it.
Don't have much of a go-build-env setup here, using your binarys! is it possible to get a build from you?
Certainly. Github isn't letting me attach a binary directly to this comment, so I added a tag and made a "release" at https://github.com/ctdk/goiardi/releases/tag/sushi_crash_fix_check with a linux/amd64 binary attached. (I assume that's the platform you were looking for.)
Not fixed in that binary.
I can recreate it with just :
knife role from file test.json
Where test.json is the above pasted json.
Interesting. I got it to happen with master with a ruby-defined role named hrm.rb like so:
run_list "recipe[apt]"
default_attributes(
"foo" => { "bar" => "baz" },
"meep" => [ "This", "", "might", "", "break" ]
)```
That was making the master branch crash, but it was fixed in the `sushi_crash` branch. Seems it can still happen, but it's a good sign.
Confirmed: I recreated that test.json role as test.rb, and it worked with that binary. I just need to find what subtle difference there is between the two.
Think I found the fix for actual real. I have a binary with it at https://github.com/ctdk/goiardi/releases/download/sushi_crash_fix_check_2/goiardi_linux_amd64_sushi_crash_fix_check_2 (if it doesn't let you download that, try getting it from https://github.com/ctdk/goiardi/releases/tag/sushi_crash_fix_check_2).
(Turns out the first fix only worked if there were exactly two duplicates - three or more would leave two left. Heavy duty house fixing intervened over the last few days, sadly, so I couldn't get over to it sooner.)
Got a new crash now :
./goiardi_linux_amd64_sushi_crash_fix_check_2 -P 80
2017/03/18 14:30:52 Logging at fatal level
panic: runtime error: index out of range
goroutine 3688 [running]:
panic(0xae8500, 0xc4200100d0)
/usr/local/Cellar/go/1.7.4_2/libexec/src/runtime/panic.go:500 +0x1a1
github.com/ctdk/goiardi/util.RemoveDupStrings(0xc4215a4800, 0x7c, 0x7c, 0xc423c4bd18, 0x1, 0x0)
/Users/jeremy/go/src/github.com/ctdk/goiardi/util/strings.go:115 +0x2e3
github.com/ctdk/goiardi/util.Indexify(0xc421cad230, 0xc421cad230, 0xc420fbcc30, 0x43b0f0)
/Users/jeremy/go/src/github.com/ctdk/goiardi/util/util.go:182 +0x5f5
github.com/ctdk/goiardi/indexer.(*IdxDoc).update(0xc420fbcc30, 0xeed220, 0xc421334620)
/Users/jeremy/go/src/github.com/ctdk/goiardi/indexer/file_index.go:429 +0x89
github.com/ctdk/goiardi/indexer.(*IdxCollection).addDoc(0xc4201c6560, 0xeed220, 0xc421334620)
/Users/jeremy/go/src/github.com/ctdk/goiardi/indexer/file_index.go:270 +0x131
github.com/ctdk/goiardi/indexer.(*FileIndex).SaveItem(0xc4201c29c0, 0xeed220, 0xc421334620, 0x0, 0x0)
/Users/jeremy/go/src/github.com/ctdk/goiardi/indexer/file_index.go:124 +0x166
created by github.com/ctdk/goiardi/indexer.IndexObj
/Users/jeremy/go/src/github.com/ctdk/goiardi/indexer/indexer.go:120 +0x5
Hmm, it Works Here™ with both of those roles I was testing. Is it breaking for you with that test.json you provided earlier, or is this with a real role?
(It's especially weird because there's a check literally right before that line that should prevent that.)
Never mind, I fiddled with it again and got it to break once more. Investigating.
OK, new binary at https://github.com/ctdk/goiardi/releases/download/sushi_crash_fix_check_2/goiardi_linux_amd64-sushi_crash3 (on the same release page as before). That's what I get for working on weird bugs when it's 3am and I can't sleep - I put "i+i" instead of "i+1" on one line. This fixed it, at least, for how I got it to break just now.
... and when I wrote a test I managed to trigger that exact error you had (the one I got to happen this morning was a little different). The binary to try is https://github.com/ctdk/goiardi/releases/download/sushi_crash_fix_check_2/goiardi_linux_amd64-sushi_crash4 (again, on the same release page as before).
No crash, all seems good! :+1:
Wonderful! Thanks for bringing this to my attention originally, and also thank you for your help and incredible patience while I tracked this down.
I'm able to crash the goardi server when delivering data bags to it using : https://github.com/one-com/running_sushi
I've not been able to find the exact call that crashes it and will continue to look for it. I've attached the stack trace of the crash.
goardi_stack_overflow.txt
Sorry for the terrible bug-report, I cant share the data I'm using but will try to create a smaller test-case.