Open abhayjohri23 opened 1 year ago
@rbri Can you please help me getting a way out of this?
Thanks for all the details - will have a deeper look and come back to you during the next days.
@abhayjohri23 sorry this got lost - still interested in this?
On trying to scrap the content (Example Thumbnail picture of a course, price etc.) from an educative website - Udemy and searching in a general URL string (given in code snippet). The source code of the site has a division with class name - "ud-app-loader ud-component--search--search" and also sub-divisions for the courses presented on screen with div class="popper-module--popper--2BpLn".
Code used to get the HTML content from the website:
On using the above code, the Javascript scripts are not loading properly to display the additional code snippet, which is visible in Inspect section of browser but not in source code.
Getting too many EvaluatorException exceptions at various places also. A glimpse of such an exception is as follows:
======= EXCEPTION START ======== Exception class=[org.htmlunit.corejs.javascript.EvaluatorException] org.htmlunit.ScriptException: An invalid or illegal selector was specified (selector: '[data-css-toggle-id' error: Invalid selectors: [data-css-toggle-id). (script in https://www.udemy.com/courses/search/?lang=en&price=price-paid&q=python&ratings=4.5&sort=relevance&sort=relevance&src=ukw from (557, 62) to (582, 10)#577) at org.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:989) at org.htmlunit.corejs.javascript.Context.call(Context.java:590) at org.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:484) at org.htmlunit.javascript.HtmlUnitContextFactory.callSecured(HtmlUnitContextFactory.java:349) at org.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:867) at org.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:843) at org.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:834) at org.htmlunit.html.HtmlPage.executeJavaScript(HtmlPage.java:966) at org.htmlunit.html.ScriptElementSupport.executeInlineScriptIfNeeded(ScriptElementSupport.java:380) at org.htmlunit.html.ScriptElementSupport.executeScriptIfNeeded(ScriptElementSupport.java:230) at org.htmlunit.html.ScriptElementSupport$1.execute(ScriptElementSupport.java:120) at org.htmlunit.html.ScriptElementSupport.onAllChildrenAddedToPage(ScriptElementSupport.java:143) at org.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:191) at org.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:601) at org.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:412) at org.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:548) at org.htmlunit.cyberneko.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1273) at org.htmlunit.cyberneko.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1200) at org.htmlunit.cyberneko.filters.DefaultFilter.endElement(DefaultFilter.java:204) at org.htmlunit.cyberneko.filters.NamespaceBinder.endElement(NamespaceBinder.java:274) at org.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:2969) at org.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1953) at org.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:834) at org.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:346) at org.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:297) at org.htmlunit.cyberneko.xerces.parsers.XMLParser.parse(XMLParser.java:76) at org.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.parse(HtmlUnitNekoDOMBuilder.java:838) at org.htmlunit.html.parser.neko.HtmlUnitNekoHtmlParser.parse(HtmlUnitNekoHtmlParser.java:203) at org.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:300) at org.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:220) at org.htmlunit.WebClient.loadWebResponseInto(WebClient.java:672) at org.htmlunit.WebClient.loadWebResponseInto(WebClient.java:574) at org.htmlunit.WebClient.getPage(WebClient.java:492) at org.htmlunit.WebClient.getPage(WebClient.java:399) at org.htmlunit.WebClient.getPage(WebClient.java:537) at org.htmlunit.WebClient.getPage(WebClient.java:519) at org.example.Scraper.getData(Scraper.java:20) at org.example.App.main(App.java:16) Caused by: org.htmlunit.corejs.javascript.EvaluatorException: An invalid or illegal selector was specified (selector: '[data-css-toggle-id' error: Invalid selectors: [data-css-toggle-id). (script in https://www.udemy.com/courses/search/?lang=en&price=price-paid&q=python&ratings=4.5&sort=relevance&sort=relevance&src=ukw from (557, 62) to (582, 10)#577) at org.htmlunit.javascript.HtmlUnitContextFactory$HtmlUnitErrorReporter.runtimeError(HtmlUnitContextFactory.java:454) at org.htmlunit.corejs.javascript.Context.reportRuntimeError(Context.java:986) at org.htmlunit.corejs.javascript.Context.reportRuntimeError(Context.java:1042) at org.htmlunit.javascript.host.dom.Document.querySelectorAll(Document.java:1044) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:222) at org.htmlunit.corejs.javascript.FunctionObject.call(FunctionObject.java:423) at org.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1874) at org.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:1051) at org.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:89) at org.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:392) at org.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:335) at org.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3914) at org.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:102) at org.htmlunit.javascript.JavaScriptEngine$2.doRun(JavaScriptEngine.java:858) at org.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:972) ... 37 more Enclosed exception: org.htmlunit.corejs.javascript.EvaluatorException: An invalid or illegal selector was specified (selector: '[data-css-toggle-id' error: Invalid selectors: [data-css-toggle-id). (script in https://www.udemy.com/courses/search/?lang=en&price=price-paid&q=python&ratings=4.5&sort=relevance&sort=relevance&src=ukw from (557, 62) to (582, 10)#577) at org.htmlunit.javascript.HtmlUnitContextFactory$HtmlUnitErrorReporter.runtimeError(HtmlUnitContextFactory.java:454) at org.htmlunit.corejs.javascript.Context.reportRuntimeError(Context.java:986) at org.htmlunit.corejs.javascript.Context.reportRuntimeError(Context.java:1042) at org.htmlunit.javascript.host.dom.Document.querySelectorAll(Document.java:1044) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:222) at org.htmlunit.corejs.javascript.FunctionObject.call(FunctionObject.java:423) at org.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1874) at script(script in https://www.udemy.com/courses/search/?lang=en&price=price-paid&q=python&ratings=4.5&sort=relevance&sort=relevance&src=ukw from (557, 62) to (582, 10):577) at script(script in https://www.udemy.com/courses/search/?lang=en&price=price-paid&q=python&ratings=4.5&sort=relevance&sort=relevance&src=ukw from (557, 62) to (582, 10):576) at org.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:1051) at org.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:89) at org.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:392) at org.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:335) at org.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3914) at org.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:102) at org.htmlunit.javascript.JavaScriptEngine$2.doRun(JavaScriptEngine.java:858) at org.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:972) at org.htmlunit.corejs.javascript.Context.call(Context.java:590) at org.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:484) at org.htmlunit.javascript.HtmlUnitContextFactory.callSecured(HtmlUnitContextFactory.java:349) at org.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:867) at org.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:843) at org.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:834) at org.htmlunit.html.HtmlPage.executeJavaScript(HtmlPage.java:966) at org.htmlunit.html.ScriptElementSupport.executeInlineScriptIfNeeded(ScriptElementSupport.java:380) at org.htmlunit.html.ScriptElementSupport.executeScriptIfNeeded(ScriptElementSupport.java:230) at org.htmlunit.html.ScriptElementSupport$1.execute(ScriptElementSupport.java:120) at org.htmlunit.html.ScriptElementSupport.onAllChildrenAddedToPage(ScriptElementSupport.java:143) at org.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:191) at org.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:601) at org.htmlunit.cyberneko.xerces.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:412) at org.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:548) at org.htmlunit.cyberneko.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1273) at org.htmlunit.cyberneko.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1200) at org.htmlunit.cyberneko.filters.DefaultFilter.endElement(DefaultFilter.java:204) at org.htmlunit.cyberneko.filters.NamespaceBinder.endElement(NamespaceBinder.java:274) at org.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:2969) at org.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1953) at org.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:834) at org.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:346) at org.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:297) at org.htmlunit.cyberneko.xerces.parsers.XMLParser.parse(XMLParser.java:76) at org.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.parse(HtmlUnitNekoDOMBuilder.java:838) at org.htmlunit.html.parser.neko.HtmlUnitNekoHtmlParser.parse(HtmlUnitNekoHtmlParser.java:203) at org.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:300) at org.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:220) at org.htmlunit.WebClient.loadWebResponseInto(WebClient.java:672) at org.htmlunit.WebClient.loadWebResponseInto(WebClient.java:574) at org.htmlunit.WebClient.getPage(WebClient.java:492) at org.htmlunit.WebClient.getPage(WebClient.java:399) at org.htmlunit.WebClient.getPage(WebClient.java:537) at org.htmlunit.WebClient.getPage(WebClient.java:519) at org.example.Scraper.getData(Scraper.java:20) at org.example.App.main(App.java:16) ======= EXCEPTION END ========
Stackoverflow thread of this question (for complete context): How to extract the HTML elements inside <div data-module-*> from a website source code using HTMLUnit?