flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.8k stars 660 forks source link

[BUG] Partially configured task resource configuration can cause nil pointers and crash Flyte Admin #5977

Open Sovietaced opened 2 weeks ago

Sovietaced commented 2 weeks ago

Describe the bug

A task resource configuration that is not completely filled out will cause nil pointers in Flyte Admin because the code assumes every fields is present and complete.

runtime/debug.Stack()
    /usr/local/go/src/runtime/debug/stack.go:24 +0x5e
github.com/flyteorg/flyte/flyteadmin/pkg/rpc/adminservice.(*AdminService).interceptPanic(0xc00215e000, {0x330c7a8, 0xc004784e70}, {0x33004b0?, 0xc00453cc40})
    /go/src/github.com/flyteorg/flyteadmin/pkg/rpc/adminservice/base.go:55 +0x7a
panic({0x27d3a40?, 0x4b0b9c0?})
    /usr/local/go/src/runtime/panic.go:914 +0x21f
github.com/flyteorg/flyte/flyteadmin/pkg/manager/impl/util.fromAdminProtoTaskResourceSpec({0x330c7a8, 0xc004784e70}, _)
    /go/src/github.com/flyteorg/flyteadmin/pkg/manager/impl/util/resources.go:61 +0x73
github.com/flyteorg/flyte/flyteadmin/pkg/manager/impl/util.GetTaskResources({_, _}, _, {_, _}, {_, _})
    /go/src/github.com/flyteorg/flyteadmin/pkg/manager/impl/util/resources.go:108 +0x3c5
github.com/flyteorg/flyte/flyteadmin/pkg/manager/impl.(*TaskManager).CreateTask(0xc000b727e0, {0x330c7a8, 0xc004784e70}, {{{}, {}, {}, 0xc000273000}, 0x0, {0x0, 0x0, ...}, ...})
    /go/src/github.com/flyteorg/flyteadmin/pkg/manager/impl/task_manager.go:64 +0x7f
github.com/flyteorg/flyte/flyteadmin/pkg/rpc/adminservice.(*AdminService).CreateTask.func1()
    /go/src/github.com/flyteorg/flyteadmin/pkg/rpc/adminservice/task.go:25 +0x9b
github.com/flyteorg/flyte/flytestdlib/promutils.StopWatch.Time({{0x7fd8f7388b40?, 0xc001c91e00?}, 0x280d800?}, 0xc001b28170)
    /go/src/github.com/flyteorg/flytestdlib/promutils/scope.go:58 +0xb6
github.com/flyteorg/flyte/flyteadmin/pkg/rpc/adminservice/util.(*RequestMetrics).Time(...)
    /go/src/github.com/flyteorg/flyteadmin/pkg/rpc/adminservice/util/metrics.go:33
github.com/flyteorg/flyte/flyteadmin/pkg/rpc/adminservice.(*AdminService).CreateTask(0xc00215e000, {0x330c7a8?, 0xc004784e70?}, 0x2c1bea0?)
    /go/src/github.com/flyteorg/flyteadmin/pkg/rpc/adminservice/task.go:24 +0x158

Expected behavior

The expected behavior is that Flyte Admin does not throw a nil pointer if only defaults or limits are specified in a task resource configuration.

One of the questions that needs to be answered is what is a valid task resource configuration? Once that is understood we can add some validation to the API which accepts this configuration.

Additional context to reproduce

project: flyteexamples
domain: development
limits:
    cpu: "2"
    memory: 450Mi

Screenshots

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

eapolinario commented 20 hours ago

@Sovietaced , after https://github.com/flyteorg/flyte/pull/5981 you shouldn't be seeing a panic anymore. This question of whether these incomplete task resource configuration objects should be considered invalid is still pertinent, but can be tackled separately, right?

Sovietaced commented 17 hours ago

@Sovietaced , after #5981 you shouldn't be seeing a panic anymore. This question of whether these incomplete task resource configuration objects should be considered invalid is still pertinent, but can be tackled separately, right?

I took a look at that PR but I don't think it will solve the issue where GetDefaults() or GetLimits() is nil

https://github.com/flyteorg/flyte/pull/5981/files#diff-36a9a91abc6f8873fbb6ca3e82f73a10f0cf99b69812a2715ea237b27bac1652R108-R109

eapolinario commented 7 hours ago

github is not cooperating, https://github.com/flyteorg/flyte/pull/5981/files#diff-36a9a91abc6f8873fbb6ca3e82f73a10f0cf99b69812a2715ea237b27bac1652R108-R109 doesn't render anything for me. Can you link to the file directly?

Sovietaced commented 5 hours ago

github is not cooperating, https://github.com/flyteorg/flyte/pull/5981/files#diff-36a9a91abc6f8873fbb6ca3e82f73a10f0cf99b69812a2715ea237b27bac1652R108-R109 doesn't render anything for me. Can you link to the file directly?

One of these two can still be nil it seems. https://github.com/flyteorg/flyte/blob/master/flyteadmin/pkg/manager/impl/util/resources.go#L108-L109

I can upstream a change I had.