aws-observability / aws-rum-web

Amazon CloudWatch RUM Web Client
Apache License 2.0
119 stars 65 forks source link

cd: smoke tests for CJS and ES modules #395

Closed ps863 closed 1 year ago

ps863 commented 1 year ago

This change introduces the following:

There are a few known issues:

Sample run of workflow


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

qhanam commented 1 year ago

2 Tests are failing for ES and CJS smoke tests

Have you removed these, or do they cause the release workflow to fail?

Operators should use the applications manually and exercise the behavior until the automation works correctly.

Does running these tests in the ES/CJS smoke tests add significant protection against regressions? If they do, we should automate it now instead of introducing a manual step. If they don't, we can omit them from the ES/CJS smoke tests.

Due to previous point, the timeout for ES and CJS tests has been set to 10 minutes.

Do you mean because of the “inability to access the AwsRum object”? How could we fix this?

qhanam commented 1 year ago

Could you simplify the smoke test apps by removing unused or unnecessary code?

adebayor123 commented 1 year ago

2 Tests are failing for ES and CJS smoke tests. This is due to inability to access the AwsRum object to create custom attributes and create a Dom event. This will be fixed in a future update. Operators should use the applications manually and exercise the behavior until the automation works correctly.

Is it the application unable to access AwsRum object? If it's not accessible, shouldn't it apply to all types of tests? Could you clarify further?

I also agree with Quinn that we should hold off on merging this until we fix the test for full automation - are we tight on deadline?

Tests are lot more flaky and overall release process will take longer. Due to previous point, the timeout for ES and CJS tests has been set to 10 minutes. This will be removed once the application is fixed to access AwsRum. Additionally, more changes will be made to improve parallelization that would reduce overall flakiness.

Is the timeout increased due to the flakiness or is it due to the issue of not being able to access AwsRum object?

Credentials may become invalid, so we retrieve them after every smoke test

This is somewhat expected - should we consider using the Duration option to increase the overall duration of a credential? https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html

ps863 commented 1 year ago

Have you removed these, or do they cause the release workflow to fail?

No the tests are still there and not removed. Reasoning of retaining the test is primarily to be aware that they are failing when running in gamma.

The workflow, however, will still continue running the next set of tests and not cause the entire workflow to stop.

Does running these tests in the ES/CJS smoke tests add significant protection against regressions? If they do, we should automate it now instead of introducing a manual step. If they don't, we can omit them from the ES/CJS smoke tests.

I believe the other smoke tests are already testing that the package loads correctly and ingestion is occurring. Most regressions could happen at the build step or initialization stage. We are testing that web client initializes correctly, by checking for a 200 response from dataplane. We also build the application itself prior to this step.

Additionally, CDN smoke tests all are functioning and would still catch a regression with the actual code in this case.

Do you mean because of the “inability to access the AwsRum object”? How could we fix this?

Yes. The fix here is to identify how the library can be accessed from the web bundle javascript that smoke.html loads so that the custom attributes and DOM event can be recorded (eg - awsRum.registerDomEvents() or awsRum.addSessionAttributes)

ps863 commented 1 year ago

2 Tests are failing for ES and CJS smoke tests. This is due to inability to access the AwsRum object to create custom attributes and create a Dom event. This will be fixed in a future update. Operators should use the applications manually and exercise the behavior until the automation works correctly.

Is it the application unable to access AwsRum object? If it's not accessible, shouldn't it apply to all types of tests? Could you clarify further?

The other tests which are passing are detected via the Plugins and dispatched. The plugins themselves load correctly into the application. The issue is when we try to directly interact with the NPM package and use library functions (eg - registerDomEvents). The two tests that fail are the ones that interact directly with the library (using registerDomEvents and addSessionAttributes). Thus they fail.

I also agree with Quinn that we should hold off on merging this until we fix the test for full automation - are we tight on deadline? Will try to look into a fix for the failing tests.

Tests are lot more flaky and overall release process will take longer. Due to previous point, the timeout for ES and CJS tests has been set to 10 minutes. This will be removed once the application is fixed to access AwsRum. Additionally, more changes will be made to improve parallelization that would reduce overall flakiness.

Is the timeout increased due to the flakiness or is it due to the issue of not being able to access AwsRum object?

The timeout is decreased in this case. This is done primarily due to the issue of not being able to access AwsRum. Without that, the workflow step unnecessarily keeps rerunning the failing tests.

Credentials may become invalid, so we retrieve them after every smoke test

This is somewhat expected - should we consider using the Duration option to increase the overall duration of a credential? https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html

Will take a look at this and see if it avoids needing to fetch the credentials thrice. I think either way, we need to get new credentials or increase how long they are, since we have more smoke tests and steps now.

limhjgrace commented 1 year ago

Are we running the tests sequentially e.g. once for CDN, once for ES, once for CJS? Do we want to/can we parallelize this to reduce the increase in runtime? But this would require additional infra so may not be worth the effort.

ps863 commented 1 year ago

Are we running the tests sequentially e.g. once for CDN, once for ES, once for CJS? Do we want to/can we parallelize this to reduce the increase in runtime? But this would require additional infra so may not be worth the effort.

Agree. Makes sense to do at a later time. Right now, priority is to have