HtmlUnit / htmlunit

HtmlUnit is a "GUI-Less browser for Java programs".
https://www.htmlunit.org
Apache License 2.0
872 stars 171 forks source link

java.util.NoSuchElementException #320

Open meladkadies opened 3 years ago

meladkadies commented 3 years ago

java.util.NoSuchElementException at java.util.AbstractList$Itr.next(AbstractList.java:364) at com.gargoylesoftware.htmlunit.html.HtmlPage.executeDeferredScriptsIfNeeded(HtmlPage.java:1470) at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:259) at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:668) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:470) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:382) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:539) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:520) at com.MKdesign.Main.get_news(Main.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)

rbri commented 3 years ago

????

I'm not a magician - if you like to get some help please try to provide a much information as possible

And of course you can also have a look at this pages

I really like to help but without any further information i see no chance.

meladkadies commented 3 years ago

????

I'm not a magician - if you like to get some help please try to provide a much information as possible

  • which version do you use
  • what page do you like to open
  • do you get the same (a similar) issue if you open the page with a real browser (check the console)
  • any chance for me to reproduce this?

And of course you can also have a look at this pages

I really like to help but without any further information i see no chance.

I am trying to crawl more than one page, and each page uses an infinite scroll. Can you help me with an example?

twendelmuth commented 3 years ago

Well it would help if you:

a.) can name at least the website you're trying to crawl b.) ideally you can make a small test case that illustrates the problem you're having and is reproduceable

Like a simple problem description & some code like in #311 goes a long way to help to reproduce & understand what you're trying to do.

meladkadies commented 3 years ago

Well it would help if you:

a.) can name at least the website you're trying to crawl b.) ideally you can make a small test case that illustrates the problem you're having and is reproduceable

Like a simple problem description & some code like in #311 goes a long way to help to reproduce & understand what you're trying to do.

ok man

code of main class

package com.MKdesign;

import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.HtmlPage; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements;

import java.io.IOException; import java.util.ArrayList;

public class Main {

static ArrayList<String> cat = new ArrayList<String>();
static ArrayList<String> news = new ArrayList<String>();
public static void main(String[] args)  {
    String url="https://www.cairo24.com/";
    get_cat(1,url);
    try {
        get_news();
    } catch (Exception e) {
        e.printStackTrace();
    }
}
private static void get_cat(int lvl,String link) {
    //getcat
    Document doc = Crawlerbase.requset_catg(link);
    Elements links = doc.getElementsByClass("nav-link");
    for (Element lnk : links) {
        String href = lnk.attr("href");
        if (href.equals("#")) {
            href = null;
        } else {
            cat.add(href);
        }
    }
    //end getcat

    //get news

    //endgetnews
    //System.out.println(news);
}
public static void get_news()throws Exception{
    Crawlerbase setup=new Crawlerbase();
    setup.request_map();
    for(int i=2;i<cat.size();i++){
        try(WebClient cli=Crawlerbase.get_driver()) {
            String urlg=cat.get(i).toString();
            cli.waitForBackgroundJavaScript(25000);
            HtmlPage pag=cli.getPage(urlg);
       pag.executeJavaScript("window.scrollTo(0,document.body.scrollHeight);");
        cli.waitForBackgroundJavaScript(10000);
        Document doc1= Jsoup.parse(pag.asXml());
        Elements divs=doc1.getElementsByClass("news");
        for (Element div : divs) {
            String href1 = div.child(1).attr("href");
            if (href1.equals("#")) {
                href1 = null;
            } else {
                news.add(href1);
            }
        }
            cli.close();

        }

        System.out.println(news);

    }
}

}

code of the helper class

package com.MKdesign; import com.gargoylesoftware.htmlunit.BrowserVersion; import com.gargoylesoftware.htmlunit.NicelyResynchronizingAjaxController; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.HtmlPage; import org.jsoup.Connection; import org.jsoup.Jsoup; import org.jsoup.nodes.Document;

import java.io.IOException;

public class Crawlerbase { public static WebClient webClient; public void request_map() throws Exception { webClient = new WebClient(BrowserVersion.CHROME); webClient.getOptions().setCssEnabled(false); webClient.getOptions().setJavaScriptEnabled(true); webClient.getOptions().setRedirectEnabled(true); webClient.getOptions().setThrowExceptionOnScriptError(false); webClient.setAjaxController(new NicelyResynchronizingAjaxController()); webClient.getOptions().setTimeout(500); } public static WebClient get_driver(){ return webClient; }

public static Document requset_catg(String link){
    try{
        Connection con= Jsoup.connect(link);
        Document doc=con.get();
        if(con.response().statusCode()==200){
            return doc;
        }
        return null;
    }
    catch(IOException e){
        return null;
    }

}

}

rbri commented 3 years ago

To make it more simple for us, can you please strip down your sample to pure HtmlUnit. From my point of view there is no functionality in Jsoup that is not also available in HtmlUnit. This will make it simpler for us to reproduce your case. Thanks.