aws-deadline / deadline-cloud-worker-agent

The AWS Deadline Cloud worker agent can be used to run a worker in an AWS Deadline Cloud fleet.
Apache License 2.0
15 stars 21 forks source link

test: include large files in syncInputJobAttachments cancelation tests to allow more time for cancelation #444

Closed YutongLi291 closed 1 month ago

YutongLi291 commented 1 month ago

What was the problem/requirement? (What/Why)

test_worker_reports_canceled_sync_input_actions_as_canceled was failing flakily because sometimes the syncInputAttachments goes too fast so we don't have time to cancel it.

What was the solution? (How)

Create bigger files in between the small files, so that the syncInputJobAttachments is much slower (due to the addition of the big files), but it does not time out due to the perceived transfer rate being too slow (which happens when too many small files are attempted to be uploaded).

The small files are actually what's slowing down the syncInputJobAttachments action, so we still need them to be able to cancel the action.

What is the impact of this change?

More robust test_worker_reports_canceled_sync_input_actions_as_canceled test

How was this change tested?

# Linux
source .e2e_linux_infra.sh
hatch run e2e-test

# Windows
source .e2e_windows_infra.sh
hatch run e2e-test

Ran 5 times each, passed every time.

Was this change documented?

No

Is this a breaking change?

No

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

sonarcloud[bot] commented 1 month ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

jusiskin commented 1 month ago

What was the problem/requirement? (What/Why)

test_worker_reports_canceled_sync_input_actions_as_canceled was failing flakily because sometimes the syncInputAttachments goes too fast so we don't have time to cancel it.

What was the solution? (How)

Create bigger files in between the small files, so that the syncInputAttachments is much slower, but it does not time out due to the perceived transfer rate being too slow.

Two things:

  1. nit: it should be syncInputAttachmentssynInputJobAttachments
  2. These seem to be pointing to different root-causes for the test failures. One says the transfer happens so fast we don't have time to cancel it. The other says that the syncInputAttachments fails because of perceived transfer rates being low.

Which is true? Is it both?

YutongLi291 commented 1 month ago

What was the problem/requirement? (What/Why)

test_worker_reports_canceled_sync_input_actions_as_canceled was failing flakily because sometimes the syncInputAttachments goes too fast so we don't have time to cancel it.

What was the solution? (How)

Create bigger files in between the small files, so that the syncInputAttachments is much slower, but it does not time out due to the perceived transfer rate being too slow.

Two things:

1. nit: it should be `syncInputAttachments` → `synInputJobAttachments`

2. These seem to be pointing to different root-causes for the test failures. One says the transfer happens so fast we don't have time to cancel it. The other says that the syncInputAttachments fails because of perceived transfer rates being low.

Which is true? Is it both?

For 1, I will change that.

For 2, the flaky failures are because the action completes too quickly. However, we tried before with around 5000 small files but that fails the action because the transfer rate is too slow and times out the job attachments code (which the latter part refers to, pardon the lack of context). So the middle ground is to create small files with large files in between, so that it still takes a long time, but average transfer rate is brought up by the larger files.