HtmlUnit / htmlunit

HtmlUnit is a "GUI-Less browser for Java programs".
https://www.htmlunit.org
Apache License 2.0
872 stars 172 forks source link

Navigating to javascript form using HtmlUnit #401

Open MatroidX opened 3 years ago

MatroidX commented 3 years ago

Note: This information is duplicated on https://stackoverflow.com/questions/69849910/navigating-to-javascript-form-using-htmlunit. Feel free to request any additional information there.

Overview

I have successfully been using HtmlUnit to navigate BoardGameGeek and execute tasks (e.g. send GeekMail). Recently they changed their login from a normal webpage to a javascript-generated form, and now I just can't seem to access the login form using HtmlUnit no matter what I try, including:

Of course I want a general understanding beyond this specific webpage / javascript code, but I give this specific concrete example since the other StackOverflow questions I looked at didn't help in this particular situation (so maybe there is something unique here?).

Steps to Reproduce

  1. Use your browser to navigate to https://boardgamegeek.com/geekmail/compose?touser=FakeUserName (any name will suffice).
  2. As long as you are not logged in to BoardGameGeek, you will then see a pop-up window titled "Sign in" with text inputs "Username" and "Password".
  3. If you View Page Source, you will see that the form is generated by javascript (https://cf.geekdo-static.com/frontend/main-es2015.aeff0e4f13bcecc7eb55.js or https://cf.geekdo-static.com/frontend/main-es5.aeff0e4f13bcecc7eb55.js). As far as I can tell, the created form has no name / id that I can use to access it. Even if it did, I do not seem to be able to view it within HtmlUnit (i.e. "Sign up" never appears when I print XML, whether for HtmlPage or HtmlForm).

Existing code

Here is my current version of the code with multiple attempts made to diagnose the problem / extract some useful information:

import com.gargoylesoftware.htmlunit.WebClient;
...
import java.util.LinkedList;

public class GeekMailSender {
    ...

    // Static variable to track all website windows.
    private final static LinkedList<WebWindow> websiteWindows = new LinkedList<WebWindow>();

    // Inner-class to listen for new (i.e. pop-up) windows.
    static class GeekMailWindowListener implements WebWindowListener {
        public void webWindowClosed(WebWindowEvent event) {}
        public void webWindowContentChanged(WebWindowEvent event) {}
        public void webWindowOpened(WebWindowEvent event) {
            GeekMailSender.websiteWindows.add(event.getWebWindow());
        }
    }

    // Method to actually send GeekMail by navigating the BGG website.
    public static void sendGeekMail(...) {
        ...
        try (final WebClient webClient = new WebClient()) {
            // Track creation of new (i.e. pop-up) windows.
            websiteWindows.clear();
            webClient.addWebWindowListener(new GeekMailWindowListener());

            // Try to access the GeekMail page.
            HtmlPage currentPage = webClient.getPage("https://boardgamegeek.com/geekmail/compose?touser=FakeUserName");
            String pageTitle = currentPage.getTitleText();
            System.out.println(pageTitle);  // BoardGameGeek

            // We may need to login first.
            if (!pageTitle.contains("GeekMail")) {
                // Need to wait for javascript to complete, otherwise no forms are available.
                webClient.waitForBackgroundJavaScriptStartingBefore(JAVASCRIPT_PAUSE);
                // No difference if use webClient.waitForBackgroundJavaScript(JAVASCRIPT_PAUSE);

                // Unfortunately the only form found is the top-right Search form on BoardGameGeek.
                if (currentPage.getForms().isEmpty()) {
                        // This does NOT happen.
                    System.out.println("WARNING! No form found, even after waiting for javascript!");
                    return;
                }

                // We don't find any windows at all... this confuses me.
                if (websiteWindows.isEmpty()) {
                        // This does happen :(
                    System.out.println("WARNING! No windows found even after waiting for javascript!");
                }

                // Additional printing does not reveal where the form is.
                // For instance, searching the XML for "Sign up" yields no results.
                System.out.println(currentPage.asXml());

                // And printing the one form we can access reveals it is just the Search form.
                System.out.println(currentPage.getForms().size());  // 1
                final HtmlForm loginForm = currentPage.getForms().get(0);
                System.out.println(loginForm.asXml());
                ...
            }
            ...
        }
        ...
    }

References

In trying to solve this, I have checked the following references (among many others):

Nevertheless, I seem unable to locate the desired form. Any help would be much appreciated!

rbri commented 3 years ago

Will have a look

MatroidX commented 3 years ago

Thanks! Became a sponsor to help support your work. Much appreciated :)

MatroidX commented 3 years ago

Was given some advice on BoardGameGeek: https://boardgamegeek.com/thread/2751648/login-form-renamed. I still need to investigate.

rbri commented 2 years ago

OK, did some first analysis here and found two problem: