appium / appium-for-mac

[deprecated] Application for automating a mac app with JSON wire protocol
Apache License 2.0
193 stars 70 forks source link

Resolve paradigm changes for "desktop automation' #21

Open stuartbrussell-intuit opened 7 years ago

stuartbrussell-intuit commented 7 years ago

This came about as a result of fixing https://github.com/appium/appium-for-mac/pull/20, and I moved the discussion out into its own issue because the crash is fixed. The problem is a different world-view between web browser testing inside a single window, vs OS testing across the entire UI. This is an issue common to AppiumForMac and WinAppDriver, and we need to coordinate to ensure the solution is the same across platforms.

Specifically the two points are:

  1. What bounds of screen shot to take. The current paradigm is screen-wide, and the code is done, except for what Jonathan pointed out for base64EncodedString. One thing to consider is that the "display" can have multiple monitors, so we may want to restrict it to the main monitor.

  2. Element locations. By default, PFAssistive.framework returns screen-relative, and AfM just passes that back to the caller. For example, the Dock, menu bar and Finder desktop icons are not really in a "window" per se. As with screen shot, let's check with Microsoft about this.

Jonathan Lipps responded to the above points thusly:

Yes, these are the issues. WinAppDriver currently returns partial-desktop screenshots, the same dimensions as the app window, but they are of the desktop (not the app); so if another app is on top of the AUT, you get a weird crop of the other app. They're working on fixing this. I believe their element locations are also desktop-relative, not app-relative. So this is a bad situation.

Whatever you pick (desktop or app), screenshot and element coordinates should be the same. Personally I'd argue that what the user typically expects is a screenshot of the AUT and for element positions to be relative to the AUT & screenshot. If you're pushing for a "desktop automation" modality where you want users to not think of apps but to think of the entire desktop, then OK. But this will be odd for the inspector as well (are you going to return all of the elements of all the apps of the desktop? And if not, why get a screenshot of it?)

You guys can decide what's possible and what's the right approach for this project, but I think this will only be valuable in the inspector under the conditions I described.

cc @yodurr