blackboard / protractor-sync

Wrapper around Protractor, providing synchronous test writing and lots of helper functions.
MIT License
17 stars 5 forks source link

Issues running Protractor-Sync Apps on AWS Lambda #51

Closed shermaneric closed 6 years ago

shermaneric commented 6 years ago

Hey there! I'm hoping this bug report is coherent. Let me know if it's not :)

  1. So at Blackboard, we are implementing the ability to run E2E tests using AWS Lambda - (big thanks to @tsu-denim for starting this up!), where eventually we are using step functions to further parallelize and shard out tests. For more context, see here: https://aws.amazon.com/blogs/devops/ui-testing-at-scale-with-aws-lambda/

  2. We are using the same packaging that we do in our traditional, vanilla environment. And specifically, it's leveraging protractor-sync@5.1.14.

3 However, we are seeing different errors when running on AWS Lambda. Specifically, the stack trace below is a big one.

  1. We were able to run the protractor-sync tests in AWS Lambda just fine. Disclaimer: it was at 5.1.17, so we can try to up the version in #2 above.

  2. lining up the same Chrome and packaging dependencies. I believe we were using Chrome version 62.

Stack trace is below. I can also attach an mp4 if it's helpful.
We'll keep poking at it as well, but just in case any obvious ideas.

Much thanks in advance!

Expected 'TypeError: Cannot read property 'map' of null
at /tmp/lambda_protractor/test/node_modules/protractor-sync/dist/selection.js:93:28
at Object.polledWait (/tmp/lambda_protractor/test/node_modules/protractor-sync/dist/polled-wait.js:21:22)
at _getElements (/tmp/lambda_protractor/test/node_modules/protractor-sync/dist/selection.js:64:26)
at Object.findVisible (/tmp/lambda_protractor/test/node_modules/protractor-sync/dist/selection.js:153:21)
at Large.Control (/tmp/lambda_protractor/test/e2e/controls/course/course-page.ts:13:53)
at new Large (/tmp/lambda_protractor/build/test/test/e2e/controls/course/course-page.js:195:42)
at Object.instantiateBreakpointClass (/tmp/lambda_protractor/test/e2e/test_util.ts:624:14)
at new Control (/tmp/lambda_protractor/test/e2e/controls/course/course-page.ts:11:23)
at Large.Control.openUltraCourse (/tmp/lambda_protractor/test/e2e/controls/course/course-card.ts:18:12)
at Large.Control.openCourse (/tmp/lambda_protractor/test/e2e/controls/base/courses/base-courses_page.ts:27:41) === Pre-asyncblock stack === Error
at module.exports (/tmp/lambda_protractor/test/node_modules/asyncblock/lib/asyncblock.js:29:15)
at UserContext.<anonymous> (/tmp/lambda_protractor/test/e2e/test_util.ts:174:5)
at /tmp/lambda_protractor/test/node_modules/jasminewd2/index.js:108:15
at new ManagedPromise (/tmp/lambda_protractor/test/node_modules/protractor/node_modules/selenium-webdriver/lib/promise.js:1067:7)
at ControlFlow.promise (/tmp/lambda_protractor/test/node_modules/protractor/node_modules/selenium-webdriver/lib/promise.js:2396:12)
at schedulerExecute (/tmp/lambda_protractor/test/node_modules/jasminewd2/index.js:95:18)
at TaskQueue.execute_ (/tmp/lambda_protractor/test/node_modules/protractor/node_modules/selenium-webdriver/lib/promise.js:2970:14)
at TaskQueue.executeNext_ (/tmp/lambda_protractor/test/node_modules/protractor/node_modules/selenium-webdriver/lib/promise.js:2953:27)
at asyncRun (/tmp/lambda_protractor/test/node_modules/protractor/node_modules/selenium-webdriver/lib/promise.js:2860:25)
at /tmp/lambda_protractor/test/node_modules/protractor/node_modules/selenium-webdriver/lib/promise.js:676:7' to equal ''.

Error: Failed expectation

at FiberFlow.errorCallback (/tmp/lambda_protractor/test/e2e/test_util.ts:207:50)

at FiberFlow.Flow._errorHandler (/tmp/lambda_protractor/test/node_modules/asyncblock-generators/lib/flow.js:530:18)

at fiberContents (/tmp/lambda_protractor/test/node_modules/asyncblock/lib/asyncblock.js:98:18)

1-18-01-12-21-13-112328.mp4.zip

mindywhitsitt commented 6 years ago

In your point #4 above did you mean that you can run protractor-sync's own tests successfully in lambda?

Note that Blackboard Ultra will not be able to take protractor-sync version 5.1.17 due to typedef issues within Ultra, so that is probably not a viable solution for you guys. (I suppose you could always check and see if the typedef issues have been straightened out.)

shermaneric commented 6 years ago

In your point #4 above did you mean that you can run protractor-sync's own tests successfully in lambda?

That is indeed, correct! Good point about 5.1.17. At this point, it seems more environment specific, but was just trying to get more insight as to how this would happen. Much thanks!

scriby commented 6 years ago

I checked the code in the area and was wondering if resolved might be able to be set to null when an exception is thrown. I put this sample together which shows it wouldn't be null in that case:

var ab = require('asyncblock');

ab(function(flow) {
  const resolveElementsCb = flow.add();
  var resolved = [];

  try {
    resolved = flow.sync(setTimeout(function() {
      resolveElementsCb(new Error('err'));
    }));
  } catch (e) {
    console.log('error: ', e.message);
  }

  console.log('resolved:', resolved);
});

This seems to indicate that elements.getWebElements() is resolving to null. I'll check the protractor/selenium code to see under what conditions that can occur.

scriby commented 6 years ago

I checked the protractor/selenium implementation for elements.getWebElements() and I don't see any code paths where it can return null.

The next most likely thing is something going wrong in asyncblock, but it's pretty unclear what about the lambda environment is triggering the issue. I assume the exact same thing not on lambda doesn't have the issue, and that it's reproducible?

It may be possible to bypass the issue by adding resolved = resolved || [] on https://github.com/blackboard/protractor-sync/blob/929255b6cf18d4e7595939b1493996ea483a8f3e/app/selection.ts#L104.

That will either "fix" the issue, or cause it to just time out if it's not able to select the element in later polls as well. It's probably worth testing at least to gather more information.

I'm not sure in your setup if you're using "direct connect" with chrome driver to run the test or connecting through a selenium hub (I forgot exactly what this thing is called, but it's a java program which the test connects to and proxies the commands). I recall that the selenium hub java app can output a verbose log file. If you're able to get logs from there it could help give some more clues.

Also, check the console output from the test itself to see if it outputs anything dealing with stale element re-selection. I'm curious if it's hitting that code path or not.

If it does turn out there's an issue w/ asyncblock and the workaround code change doesn't resolve it, I'm probably not going to be able to be much help without a repro.

shermaneric commented 6 years ago

Hey @scriby thanks for the great details! Both myself and @tsu-denim had been on PTO Thursday and Friday.
The first thing I'll check when I'm back on fully is to see if directConnect is true. I'm fairly certain it is. I'll play around with that workaround. I'll also see if we can get repro steps from a publicly facing repo as well.

More to come. Again, thanks @scriby!

tsu-denim commented 6 years ago

Thanks @scriby, @shermaneric , @mindywhitsitt ! We are using direct connect, but we should be able to use the hub to grab those logs and provide a more complete set of debug info.

shermaneric commented 6 years ago

Hey @scriby, much thanks to you and @tsu-denim (Kurt), we believe we have figured out the issue.

  1. It was a very good idea to do a directConnect: false and to start chromeDriver as a separate process beforehand. Following https://sites.google.com/a/chromium.org/chromedriver/logging, we were able to to generate a chromedriver log which is attached.

  2. In the attachment, we saw a signal 7 BUS error. Kurt was able to track it down here and saw that we were running out of disk space. Particularly in the /tmp directory where we had Chrome and variant other programs (xvfb, chromedriver, etc.) running. Chrome also creates cache files and filled up the filesystem attached to /tmp quite quickly, causing the above error.

  3. As the error from Chrome is quite vague, protractor and protractor-sync keep moving along and protractor-sync correctly reported back null on elements.getWebElements()

  4. The solution is to start moving out files from /tmp into the lambda function directly where we have access to more space. We had used /tmp originally as other places in AWS lambda were read only. Over time, we'll figure out perhaps to run Chrome in a more suitable place as well.

Closing ticket. Thanks again @scriby and @tsu-denim ! 1-18-01-22-21-21-168073.chromedriver.log

tsu-denim commented 6 years ago

Thanks for the troubleshooting everyone! Chrome basically just disappears when space runs out and mangles the response from the dev tools api that chrome driver uses. Given what was provided, the logs coming from the test are pretty accurate. Chrome really did disappear lol!