ctdk / goiardi

A Chef server written in Go, able to run entirely in memory, with optional persistence with saving the in-memory data to disk or using MySQL or Postgres as the data storage backend. Docs: http://goiardi.readthedocs.io/en/latest/index.html
http://goiardi.gl
Apache License 2.0
280 stars 39 forks source link

Stack overflow when uploading/creating data bags. #48

Closed simonvik closed 7 years ago

simonvik commented 8 years ago

I'm able to crash the goardi server when delivering data bags to it using : https://github.com/one-com/running_sushi

I've not been able to find the exact call that crashes it and will continue to look for it. I've attached the stack trace of the crash.

goardi_stack_overflow.txt

Sorry for the terrible bug-report, I cant share the data I'm using but will try to create a smaller test-case.

ctdk commented 8 years ago

Interesting, I'll take a look and see what I can do to duplicate it.

One thing: could you try grabbing the binary from the latest release at https://github.com/ctdk/goiardi/releases/tag/v0.10.4 and see if it still happens? I ask because I built those binaries with go 1.6 instead of 1.5.1, and there was a fix that might be relevant that was added in either 1.5.2 or 1.5.3.

Also, is the data bag in question particularly large? It might help with figuring out what's happening.

Thanks!

simonvik commented 8 years ago

No difference using 0.10.4.

Seems to crash after a POST to /data with the content of :

{"name":"apt_repositories","json_class":"Chef::DataBag","chef_type":"data_bag"}

The strange thing is that i cant recreate it using knife, can it be some internal state?

The request from tcpdump looks like:

POST /data HTTP/1.1
Content-Type: application/json
Accept: application/json
Accept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3
X-Ops-Server-Api-Version: 1
X-Ops-Sign: algorithm=sha1;version=1.1;
X-Ops-Userid: admin
X-Ops-Timestamp: 2016-03-24T20:25:01Z
X-Ops-Content-Hash: Si+4GbVP/13L7DLPsr0G2DSZTKw=
X-Ops-Authorization-1: QEKxbJEXkNK1ucRtLfCHj3z6ja3zspJ2QME6eJSMXTx9YhTAcIIPnoBLSgtq
X-Ops-Authorization-2: Ade86aNnUnDw3Mva4ATZuddl6r8tuvihNsvcRxzOtts7B56NUErvmarL5w6B
X-Ops-Authorization-3: o5MfkFJz1vMKhQltGkN26orMkyoTaDXzb+dukAUti2XHeqRO5DCXYETHpvBy
X-Ops-Authorization-4: Bc/P/qY56hhbW8V9gx+FmX94znx88Z1hygkUObXs4b8tnf7DHp/+BWn8Vl8l
X-Ops-Authorization-5: fwTQ54X8hVE9X6bez3Qt/rWBN84FhO1z42m7Za9J4KPDWoCwSD9tYAPtCV7u
X-Ops-Authorization-6: H6iW3qPl2j0ohvaXyM14m8upR7Xvug713iFXowqTww==
Host: 127.0.0.1:4545
X-Remote-Request-Id: 1185bf17-1d48-471f-9313-4db2d27dd431
Content-Length: 79
X-Chef-Version: 12.8.1
User-Agent: Chef Knife/12.8.1 (ruby-2.1.6-p336; ohai-8.11.1; x86_64-linux; +https://chef.io)
Connection: close

{"name":"apt_repositories","json_class":"Chef::DataBag","chef_type":"data_bag"}

I do not get a response after this.

ctdk commented 8 years ago

(Haven't forgotten about this, I've just been super busy with some work stuff.)

I haven't been able to duplicate this with knife at all, so it looks like running_sushi is necessary to trigger it. That's the next thing I'm going to try.

ctdk commented 7 years ago

I finally had a chance to come back to this, such a long time later (combination of being busy and not feeling well), but I still can't duplicate this. Posting raw data bags (although that one you suggested there fails because it doesn't include an "id" no matter how I try it), knife, and running_sushi all work just fine.

I do seem to be running an older chefdk still, so I'll try upgrading that and see if it makes a difference. Otherwise, I won't be able to make much progress with this without some more clues I'm afraid.

simonvik commented 7 years ago

Tested this again, It does not seem to be related to data-bags, but the amount of objects.

The stack error seems a bit moved too.

log.txt

I'm uploading about 120 cookbooks, 200 roles before it dies

ctdk commented 7 years ago

Great, that'll help a bunch. Thanks for taking a look at that again, I'll get to work on it.

ctdk commented 7 years ago

One small question: are the data bags in question encrypted by any chance, or do they have very large values inside of them (specifically like "foo": "barbaz.....", where "barbaz...." is some very long string)?

simonvik commented 7 years ago

not encrypted and very small databags and i able to crash it without uploading any databags at all. Im uploading the stuff pretty fast with running_sushi.

ctdk commented 7 years ago

Thanks, just trying to duplicate it again. (I know it wasn't limited to data bags, that was just what I was trying first.)

ctdk commented 7 years ago

One more question: how much RAM does the VM or machine you're running goiardi in have available?

simonvik commented 7 years ago

At least 10G and was running inside a systemd-nspawn as root on a ubuntu userspace.

ctdk commented 7 years ago

Hmm, I got it to crash with an out of memory error, but that wasn't right. I think I have a pretty good idea what it is finally, but I still have yet to actually get the stack overflow to happen.

Can you tell me anything about the values inside these data bags and roles? I remember it being private before, but I have a feeling that some particular kind of string is able to trip up the indexer and send it into an infinite loop.

I don't think I've asked this yet - is there any unicode in these data bags or roles, or weird binary data? (The latter I'm not sure would work in json, but I'm just trying to cover some, if not all, bases.)

simonvik commented 7 years ago

We have two databags with utf8-chars "ø" but no roles and it dies even if i comment out the delivery of databags.

The roles contain loads of different things, think it will be hard for me to find any exact thing there. Will try to see if i can remove items until it doesn't crash anymore.

simonvik commented 7 years ago

So, found what crashes it, not related to anything i wrote above!

The following role crashes it, when you have multiple keys, dont know if the emptines matters with the same value in an array.

{
  "name": "test",
  "description": "test",
  "json_class": "Chef::Role",
  "default_attributes": {
    "test" : [
      "this",
      "",
      "is",
      "",
      "a",
      "",
      "test"
    ]
  }
}

Very unsure why we had this but anyhow.

ctdk commented 7 years ago

Interesting! Thank you for that - I had been trying to trip it up with weird unicode chars and unusually long values, but I hadn't thought to try that. I think I know what the problem is (it's close to what I thought it was, as a problem with the trie index, but not quite what I had in mind before).

ctdk commented 7 years ago

I believe I have a fix for this problem (finally). Can you build the sushi_crash branch (https://github.com/ctdk/goiardi/tree/sushi_crash) and try uploading a role that normally makes it crash? I was finally able to duplicate it with arrays with duplicate items, and the above made it work.

Assuming it works well for you, I'll do some cleanup and improvements and make a formal release with it.

simonvik commented 7 years ago

Don't have much of a go-build-env setup here, using your binarys! is it possible to get a build from you?

ctdk commented 7 years ago

Certainly. Github isn't letting me attach a binary directly to this comment, so I added a tag and made a "release" at https://github.com/ctdk/goiardi/releases/tag/sushi_crash_fix_check with a linux/amd64 binary attached. (I assume that's the platform you were looking for.)

simonvik commented 7 years ago

Not fixed in that binary.

I can recreate it with just : knife role from file test.json

Where test.json is the above pasted json.

ctdk commented 7 years ago

Interesting. I got it to happen with master with a ruby-defined role named hrm.rb like so:


run_list "recipe[apt]"
default_attributes(
  "foo" => { "bar" => "baz" },
  "meep" => [ "This", "", "might", "", "break" ]
)```

That was making the master branch crash, but it was fixed in the `sushi_crash` branch. Seems it can still happen, but it's a good sign.
ctdk commented 7 years ago

Confirmed: I recreated that test.json role as test.rb, and it worked with that binary. I just need to find what subtle difference there is between the two.

ctdk commented 7 years ago

Think I found the fix for actual real. I have a binary with it at https://github.com/ctdk/goiardi/releases/download/sushi_crash_fix_check_2/goiardi_linux_amd64_sushi_crash_fix_check_2 (if it doesn't let you download that, try getting it from https://github.com/ctdk/goiardi/releases/tag/sushi_crash_fix_check_2).

(Turns out the first fix only worked if there were exactly two duplicates - three or more would leave two left. Heavy duty house fixing intervened over the last few days, sadly, so I couldn't get over to it sooner.)

simonvik commented 7 years ago

Got a new crash now :

 ./goiardi_linux_amd64_sushi_crash_fix_check_2 -P 80 
2017/03/18 14:30:52 Logging at fatal level
panic: runtime error: index out of range

goroutine 3688 [running]:
panic(0xae8500, 0xc4200100d0)
    /usr/local/Cellar/go/1.7.4_2/libexec/src/runtime/panic.go:500 +0x1a1
github.com/ctdk/goiardi/util.RemoveDupStrings(0xc4215a4800, 0x7c, 0x7c, 0xc423c4bd18, 0x1, 0x0)
    /Users/jeremy/go/src/github.com/ctdk/goiardi/util/strings.go:115 +0x2e3
github.com/ctdk/goiardi/util.Indexify(0xc421cad230, 0xc421cad230, 0xc420fbcc30, 0x43b0f0)
    /Users/jeremy/go/src/github.com/ctdk/goiardi/util/util.go:182 +0x5f5
github.com/ctdk/goiardi/indexer.(*IdxDoc).update(0xc420fbcc30, 0xeed220, 0xc421334620)
    /Users/jeremy/go/src/github.com/ctdk/goiardi/indexer/file_index.go:429 +0x89
github.com/ctdk/goiardi/indexer.(*IdxCollection).addDoc(0xc4201c6560, 0xeed220, 0xc421334620)
    /Users/jeremy/go/src/github.com/ctdk/goiardi/indexer/file_index.go:270 +0x131
github.com/ctdk/goiardi/indexer.(*FileIndex).SaveItem(0xc4201c29c0, 0xeed220, 0xc421334620, 0x0, 0x0)
    /Users/jeremy/go/src/github.com/ctdk/goiardi/indexer/file_index.go:124 +0x166
created by github.com/ctdk/goiardi/indexer.IndexObj
    /Users/jeremy/go/src/github.com/ctdk/goiardi/indexer/indexer.go:120 +0x5
ctdk commented 7 years ago

Hmm, it Works Here™ with both of those roles I was testing. Is it breaking for you with that test.json you provided earlier, or is this with a real role?

ctdk commented 7 years ago

(It's especially weird because there's a check literally right before that line that should prevent that.)

ctdk commented 7 years ago

Never mind, I fiddled with it again and got it to break once more. Investigating.

ctdk commented 7 years ago

OK, new binary at https://github.com/ctdk/goiardi/releases/download/sushi_crash_fix_check_2/goiardi_linux_amd64-sushi_crash3 (on the same release page as before). That's what I get for working on weird bugs when it's 3am and I can't sleep - I put "i+i" instead of "i+1" on one line. This fixed it, at least, for how I got it to break just now.

ctdk commented 7 years ago

... and when I wrote a test I managed to trigger that exact error you had (the one I got to happen this morning was a little different). The binary to try is https://github.com/ctdk/goiardi/releases/download/sushi_crash_fix_check_2/goiardi_linux_amd64-sushi_crash4 (again, on the same release page as before).

simonvik commented 7 years ago

No crash, all seems good! :+1:

ctdk commented 7 years ago

Wonderful! Thanks for bringing this to my attention originally, and also thank you for your help and incredible patience while I tracked this down.