Shippable / support

Shippable SaaS customers can report issues and feature requests in this repository
100 stars 28 forks source link

Shippable build error - 1597881288618128498 [Error] node must have children #5137

Open rakeshlacework opened 3 years ago

rakeshlacework commented 3 years ago

Description of your issue:

We are noticing this log line being emitted in the shippable logs 1597881288618128498 [Error] node must have children this breaking some of our tests and hence unable to get builds produced.

I don't know where the log line is coming from and need this resolved asap since it is release affecting issue

a-murphy commented 3 years ago

Could you run a smaller subset of your tests to determine which test, if it is a particular test, has the error?

One of the more recent failures has [no test files] logged near the node must have children error. Could there be required files that weren't committed to the repository on GitHub?

rakeshlacework commented 3 years ago

We have tried moving things around but somehow the log lines is not going away and shows up randomly. For example a recent build -


goos: linux
goarch: amd64
pkg: github.com/lacework/agent/linux
BenchmarkInit-2             3000            380432 ns/op
BenchmarkLogin-2          100000             19928 ns/op
PASS
ok      github.com/lacework/agent/linux 4.994s
cd datacollector; make test
make[1]: Entering directory `/root/src/github.com/lacework/agent/datacollector'
go test -cover
1597882036977271043 [Error] node must have children
time="2020-08-20T00:07:16Z" level=info msg="{Alloc:3870232 TotalAlloc:6679880 Sys:10153087 Lookups:12 Mallocs:46489 Frees:27257 HeapAlloc:3870232 HeapSys:6848512 HeapIdle:1581056 HeapInuse:5267456 HeapReleased:0 HeapObjects:19232 StackInuse:491520 StackSys:491520 MSpanInuse:66120 MSpanSys:81920 MCacheInuse:3472 MCacheSys:16384 BuckHashSys:1532098 GCSys:442368 OtherSys:740285 NextGC:7741040 LastGC:1597882036999481299 PauseTotalNs:600269 PauseNs:[327679 228025 44565 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] PauseEnd:[1597882036982830379 1597882036990752893 1597882036999481299 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0```

Not knowing the source of the error log line. Have a few questions on the limits with the node type being used - 
* Root file system size
* /tmp file system size
* shippable build log size
a-murphy commented 3 years ago

Space available on the node when the build starts is listed in the "Job node info" section of the logs. The total space will also be updated on the node page, under the node pool, while it is running.

If you click the download option in the table above the logs, you can download your logs to get an idea of how big the logs are. But, since logs are streamed to display them as the step runs, that's not a very accurate representative of how many are on the node at any given time. There is a limit on lines of logs at which the logs are only available for download, and not displayed in the UI, but you haven't reached that.

rakeshlacework commented 3 years ago

So it is not the disk space that I thought could cause this, although I don't fully understand the usage reported in the JOb node info.

Nevertheless back to the original issue the log line is emitted periodically from a build log

1597882036977271043 [Error] node must have children
1597882340679062256 [Error] node must have children
1597882733219791386 [Error] node must have children
1597882805098381691 [Error] node must have children
1597882901984594276 [Error] node must have children
1597882971586959153 [Error] node must have children
1597883037391323710 [Error] node must have children
1597883100368613117 [Error] node must have children
1597883162960468746 [Error] node must have children
1597883226159355521 [Error] node must have children
1597883288860453313 [Error] node must have children
1597883351929062145 [Error] node must have children
1597883415166986074 [Error] node must have children
1597883584751768368 [Error] node must have children
1597883647612693996 [Error] node must have children
1597883767525076333 [Error] node must have children
1597883828755902629 [Error] node must have children
1597883980336957826 [Error] node must have children
1597884026806578884 [Error] node must have children
1597884225870427128 [Error] node must have children
1597884643652532582 [Error] node must have children
1597884711543967102 [Error] node must have children
1597884764966714514 [Error] node must have children
1597884826376162348 [Error] node must have children
1597885150692157513 [Error] node must have children

and I know from the code or build scripts it is not us. Please help its causing release issue for us

a-murphy commented 3 years ago

I haven't found any logs like that that anything on Shippable would have created, but I'll keep looking through all the libraries we use. In the meantime, it looks like there are several network connection failures in your tests. Do you know which of those were expected?

trriplejay commented 3 years ago

@rakeshlacework this error appears to come from a library called "seelog". I see some lines in your console that reference this library, so i think it must be something going on with your tests. not sure if it's the root cause or a downstream side effect of some other error going on during test.

https://github.com/cihub/seelog/blob/f561c5e57575bb1e0a2167028b7339b3a8d16fb4/cfg_errors.go#L32 https://github.com/cihub/seelog/blob/f561c5e57575bb1e0a2167028b7339b3a8d16fb4/cfg_parser.go#L649

rakeshlacework commented 3 years ago

Thanks @trriplejay that was helpful.

Although we are not quite out of the woods. I am trying to up the Machine Type meanwhile for faster turn arounds on build using the node pool to go to XL here -

https://app.shippable.com/subs/github/lacework/nodePools/204470/view

We already bought 2 additional XL licenses but it is not allowing me to apply to this node pool. Can you please suggest

a-murphy commented 3 years ago

You appear to have added new Ubuntu 16.04 XL node licenses, but that node pool is using Ubuntu 14.04 nodes. Did you also intend to switch to 16.04? Either way, you should be able to delete (or rename) the existing node pool and create a new node pool with the same name to use the new licenses. Multiple sizes cannot be in the same node pool.

rakeshlacework commented 3 years ago

Oh ok, how do we change the node pool using Ubuntu 14.04 nodes to XL?

a-murphy commented 3 years ago

You will have to delete the node pool and create a new one to change the size. You should be able to reuse the same node pool name.

rakeshlacework commented 3 years ago

thank you. how can we see the resource utilization of the node (cpu, memory and network)

a-murphy commented 3 years ago

If you go to the node pools page, click the name of the node pool to go to the node pool's page, and then click the name of a node, you should see graphs with recent statistics. That will contain CPU and memory, but we currently have no way to track network usage for a particular node.