SeleniumHQ / selenium

A browser automation framework and ecosystem.
https://selenium.dev
Apache License 2.0
30.15k stars 8.1k forks source link

[πŸš€ Feature]: Implement high level BiDi script commands #13992

Open titusfortner opened 3 months ago

titusfortner commented 3 months ago

Feature and motivation

At the Selenium Dev Summit we agreed on this API to be generally applied across the bindings; we'll keep this labeled beta while we make sure that it works for what is needed

We want the methods to be accessible from a script() method available directly from the Driver class (e.g., driver.script.pin(script), driver.Script().Pin(script)). We can't do everything just like this in all the languages, because, for example, .NET uses events with a += and -= for adding and removing handler events so we don't went "add" and "remove" methods.

Implementations:

Method Java NodeJS Python Ruby .NET
pin() βœ… ~#14250~ βœ… ~#14305~
unpin() βœ… ~#14305~ βœ… ~#14250~
execute() #14330 #14293
addConsoleMessageHandler() βœ… ~#14225~ βœ… ~#14135~ βœ… ~#14107~ βœ… ~#14073~ #14057
removeConsoleMessageHandler() βœ… ~#14225~ βœ… ~#14135~ βœ… ~#14107~ βœ… ~#14073~ #14057
addDomMutationHandler() βœ… ~#14304~ βœ… ~#14238~
removeDomMutationHandler() βœ… ~#14304~ βœ… ~#14238~
addJavaScriptErrorHandler() βœ… ~#14225~ βœ… ~#14135~ βœ… ~#14107~ βœ… ~#14073~ #14057
removeJavaScriptErrorHandler() βœ… ~#14225~ βœ… ~#14135~ βœ… ~#14107~ βœ… ~#14073~ #14057

Considerations

Mr0grog commented 2 months ago

Hi, are you interested in feedback on the design of this? If so, where, and Is there any kind of design document/plan? (I’m not sure what might be implementation-specific or only partially implemented, other than a vague sense based on the table in this issue.)

I’ve been playing a bit with the Ruby implementation as a replacement for some devtools-based code I wrote recently and had some thoughts I hope might be useful.

p0deje commented 2 months ago

@Mr0grog Absolutely, please share your feedback. We don't have the document describing all of this because the API was designed during the in-person DevSummit of Selenium TLC. As we work on implementing these APIs, the feedback is greatly appreciated!

Mr0grog commented 2 months ago

πŸ‘ Great!

First off, the Script API is very pleasant to use. I’m really glad to see it! ❀️ The straightforward unsubscribe methods are also a nice improvement over devtools.

That said, one thing I’d love is for console log and javascript error objects from the high-level Script API to include the URL of the page they occurred on or, even better, the URL of both the frame they occurred in and the top-level/source frame they occurred in (when iframes or popups are involved). Or maybe a BrowsingContextInfo object? (Since that includes links up and down the context tree.)

Barring that, the raw BiDi API’s log.entryAdded event (when a console log or unhandled JS error occurs) has a source property with the ID of the context, but that field gets dropped in the mid-level (LogHandler) and high-level APIs. Having it there would allow me to line up a log message with the context using other BiDi commands and events (namely browsingContext.getTree and browsingContext.fragmentNavigated), instead of working around the new high-level API entirely. (In this case, it would also be nice to have some high-level access to the context tree and various navigation events, but I’m hoping they’re already planned for even if I can’t find the issue. πŸ™‚ 🀞)

Background/use case: I’ve been using the devtools API to keep track of what page JS logs and errors are coming from during browser-based tests. This is just generally useful in testing workflows that cover multiple pages or that involve iframes or popups. In my case, an app I recently started working on has tests that include navigations to other sites (e.g. a signup flow that ends by kicking someone over to a marketing page managed by another system at a different subdomain). I use information about the URL to decide whether to ignore JS logs/errors (since we don’t want our tests to fail and prevent a build if the other site is having some issues).

Obviously pulling multiple info sources together like I’m suggesting in my ideal scenario adds a lot of complexity to the implementation, but I suspect this info might be useful in a lot of ways whenever automations navigate across multiple pages, perform form submissions, etc.

p0deje commented 2 months ago

I think we should preserve source in the log entry and make sure it has the browsingContext that is properly serialized. @titusfortner @pujagani Any objections?

titusfortner commented 2 months ago

@Mr0grog Thanks for the feedback! We're definitely trying to figure this out as we go along. Can you make a Pull Request with what you'd like to see?

pujagani commented 2 months ago

Yes, the source bit was added in the web driver BiDi spec relatively recently. I am guessing for the same reason as requested above. We should make everything that is part of types defined in spec https://w3c.github.io/webdriver-bidi/#types-log-logentry, available to the user.

titusfortner commented 2 months ago

Oh, I see what you're asking; we just need to update the Struct with source:

        ConsoleLogEntry = BiDi::Struct.new(:level, :text, :timestamp, :method, :args, :type)
        JavaScriptLogEntry = BiDi::Struct.new(:level, :text, :timestamp, :stack_trace, :type)

That's weird, I would have sworn I added things based on the spec, not sure how I missed source

Mr0grog commented 2 months ago

Oh, I see what you're asking; we just need to update the Struct with source

Yeah, that is the simplest solution. I guess I thought I’d go ahead and propose something that did more given the description of Script as β€œhigh-level” API, which generally implies to me that the intent isn’t simply to marshall messages over the protocol, but to combine parts of the protocol in more use-case-specific ways when reasonable (up to you if this use case is common/reasonable or too narrow!).

It would also be nice if the things in the source field were more meaningful objects, i.e. if the source.context was a BrowserContextInfo that I could use to navigate up the context tree instead of just an ID string (and same for source.realm, although it looks like there is nothing at all written to wrap realm stuff yet). I can appreciate that might be fairly complex to do in a performant way, though. (In that case, I also need some API to get contexts based on ID, since it doesn’t exist yet and in #14201 I learned I should not actually be using the BiDi API directly, which was not clear to me when I made my earlier comments.)

titusfortner commented 2 months ago

Right, the first step is to just expose what's available. I'm a little resistant to adding a bunch of objects for everything in Ruby, but we definitely want to make sure that the output is in a state that is usable in other parts of the code. Maybe the string is what gets passed in as a parameter to another method that provides the tree parsing information.