Closed tdrobcsak closed 6 years ago
At first glance, it seems to be the default behavior of stripping URL "fragments" (which are normally just anchors within the same page). In your case, if you need to preserve the #
sign, have a look at GenericURLNormalizer.
You can overwrite the default behavior by taking out removeFragment
from the default list of normalization rules, like this:
<urlNormalizer class="com.norconex.collector.http.url.impl.GenericURLNormalizer">
<normalizations>
lowerCaseSchemeHost, upperCaseEscapeSequence,
decodeUnreservedCharacters, removeDefaultPort,
encodeNonURICharacters, addWWW
</normalizations>
</urlNormalizer>
Note though, that if your page is fully dynamic (javascript-driven), it will not solve all your problems. The HTTP Collector does not execute JavaScript. Luckily for those sites, you can integrate with PhantomJS. Have a look at PhantomJSDocumentFetcher.
Thanks for normilzer tip. I was able to check with each url, however it has dynamicaly created content which i would like to fetch, therefore in my config I added PhantomJSDocument fetcher with following tag
<documentFetcher class="${http}.fetch.impl.PhantomJSDocumentFetcher"
detectContentType="false" detectCharset="false" screenshotEnabled="false">
<exePath>phantomjs-2.1.1-macosx/bin/phantomjs</exePath>
<scriptPath>scripts/phantom.js</scriptPath>
<renderWaitTime>5000</renderWaitTime>
<validStatusCodes>200</validStatusCodes>
<notFoundStatusCodes>404</notFoundStatusCodes>
</documentFetcher>
Yet I getting following error message:
ERROR [AbstractCrawler] Expert Page : Could not process document: https://loc.salon-expert.hu/#sal_1 (null)
java.lang.NullPointerException at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.createPhantomJSCommand(PhantomJSDocumentFetcher.java:1030) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.fetchPhantomJSDocument(PhantomJSDocumentFetcher.java:799) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.fetchDocument(PhantomJSDocumentFetcher.java:773) at com.norconex.collector.http.pipeline.importer.DocumentFetcherStage.executeStage(DocumentFetcherStage.java:42) at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:31) at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:24) at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) at com.norconex.collector.http.crawler.HttpCrawler.executeImporterPipeline(HttpCrawler.java:360) at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:538) at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:419at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:812) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.base/java.lang.Thread.run(Thread.java:844)
Can you help me understand what i don't use right here?
Which version are you using? If your log file is not too big, can you attach it? Also, can you try having absolute paths for the execPath
and scriptPath
tags?
Here is the attached log file.
Version: norconex-collector-http-2.8.0 Addedd full path, but result is the same..
Please give a try to the snapshot version of HTTP Collector. It fixes that NullPointerException
.
If you do not want to upgrade for some reason, a workaround can be to specify <screenshotDimensions>
under the PhantomJSFetcher with an arbitrary value. That should also get rid of the exception.
Please confirm.
Thanks Pascal
I addedd <screenshotDimensions>"2560 x 1600"</screenshotDimensions>
tag, and it did get rid of exception, however it wasn't able to fetch dynamicaly generated data... It seem that render time do not actually kick-in..
See my config file
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<httpcollector id="SzalonExpert.hu Collector">
#set($http = "com.norconex.collector.http")
#set($core = "com.norconex.collector.core")
#set($urlNormalizer = "${http}.url.impl.GenericURLNormalizer")
#set($filterExtension = "${core}.filter.impl.ExtensionReferenceFilter")
#set($filterRegexRef = "${core}.filter.impl.RegexReferenceFilter")
<progressDir>${workdir}/progress</progressDir>
<logsDir>${workdir}/logs</logsDir>
<crawlerDefaults>
<robotsTxt ignore="true" />
<startURLs stayOnDomain="true">
<urlsFile>examples/Loreal-Fodraszat/webpage-list</urlsFile>
<!--<url>https://loc.salon-expert.hu/#adr_budapest;0,0,0,0,0,0,0,0 </url> -->
</startURLs>
<urlNormalizer class="com.norconex.collector.http.url.impl.GenericURLNormalizer">
<normalizations>
lowerCaseSchemeHost, upperCaseEscapeSequence,
decodeUnreservedCharacters, removeDefaultPort,
encodeNonURICharacters
</normalizations>
</urlNormalizer>
<!--<urlNormalizer class="$urlNormalizer" />-->
<numThreads>1</numThreads>
<maxDepth>4</maxDepth>
<workDir>$workdir</workDir>
<!-- <orphansStrategy>DELETE</orphansStrategy>-->
<!--<sitemapResolverFactory ignore="false" />-->
<referenceFilters>
<filter class="$filterExtension" onMatch="exclude">jpg,gif,png,ico,css,js</filter>
<!--<filter class="$filterRegexRef" onMatch="include">https://loc.salon-expert.hu/#sal_\d+</filter> -->
</referenceFilters>
<!--<documentFetcher detectContentType="true" detectCharset="true"/> -->
<!--<documentFilters>
<filter class="$filterRegexRef" onMatch="include">https://loc.salon-expert.hu/#sal_\d+</filter>
</documentFilters> -->
</crawlerDefaults>
<crawlers>
<crawler id="Expert Page ">
<robotsTxt ignore="true" />
<keepDownloads>true</keepDownloads>
<documentFetcher class="com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher"
detectContentType="true" screenshotEnabled="true">
<exePath>/Users/teoodordrobcsak/Downloads/norconex-collector-http-2.8.0/phantomjs-2.1.1-macosx/bin/phantomjs</exePath>
<scriptPath>/Users/teoodordrobcsak/Downloads/norconex-collector-http-2.8.0/scripts/phantom.js</scriptPath>
<renderWaitTime>5000</renderWaitTime>
<referencePattern>^https://loc.salon-expert.hu/#sal.*</referencePattern>
<screenshotDimensions>"2560 x 1600"</screenshotDimensions>
<validStatusCodes>200</validStatusCodes>
<notFoundStatusCodes>404</notFoundStatusCodes>
</documentFetcher>
#parse("shared/importer-config.xml")
<committer class="com.norconex.committer.core.impl.MultiCommitter">
<committer class="com.norconex.committer.core.impl.FileSystemCommitter">
<directory>${workdir}/crawledFilesMETA</directory>
</committer>
<committer class="com.norconex.committer.core.impl.XMLFileCommitter">
<directory>${workdir}/crawledFilesXML</directory>
<docsPerFile>1</docsPerFile>
<pretty>true</pretty>
<splitAddDelete>false</splitAddDelete>
</committer>
</committer>
</crawler>
</crawlers>
</httpcollector>
Do you get any errors? There is at least another fix in the PhantomJSFetcher with the snapshot release. Please give it a try.
Hi Pascal
Thanks for trying to help, however I used the SNAPSHOT RELEASE and release and I notice two interesting things
1) If I kept the ERROR [PhantomJSDocumentFetcher] PhantomJS: https://stats.g.doubleclick.net/r/collect?v=1&aip=1&t=dc&_r=3&tid=UA-62480304-3&cid=561317613.1524634161&jid=1698485650&_gid=1280410387.1524773794&gjid=109555892&_v=j67&z=260275232: Operation canceled Caught and handled this exception : java.lang.NumberFormatException: For input string: "" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.base/java.lang.Integer.parseInt(Integer.java:662) at java.base/java.lang.Integer.parseInt(Integer.java:770) at com.github.jaiimageio.impl.common.ImageUtil.processOnRegistration(ImageUtil.java:1401) at com.github.jaiimageio.impl.plugins.wbmp.WBMPImageReaderSpi.onRegistration(WBMPImageReaderSpi.java:96) at java.desktop/javax.imageio.spi.SubRegistry.registerServiceProvider(ServiceRegistry.java:788) at java.desktop/javax.imageio.spi.ServiceRegistry.registerServiceProvider(ServiceRegistry.java:330) at java.desktop/javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:212) at java.desktop/javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:136) at java.desktop/javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:157) at java.desktop/javax.imageio.ImageIO.<clinit>(ImageIO.java:66) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.handleScreenshot(PhantomJSDocumentFetcher.java:889) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.fetchPhantomJSDocument(PhantomJSDocumentFetcher.java:842) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.fetchDocument(PhantomJSDocumentFetcher.java:773) at com.norconex.collector.http.pipeline.importer.DocumentFetcherStage.executeStage(DocumentFetcherStage.java:42) at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:31) at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:24) at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) at com.norconex.collector.http.crawler.HttpCrawler.executeImporterPipeline(HttpCrawler.java:360) at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:538) at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:419) at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:815) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) at java.base/java.lang.Thread.run(Thread.java:844) Caught and handled this exception : java.lang.NumberFormatException: For input string: "" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.base/java.lang.Integer.parseInt(Integer.java:662) at java.base/java.lang.Integer.parseInt(Integer.java:770) at com.github.jaiimageio.impl.common.ImageUtil.processOnRegistration(ImageUtil.java:1401) at com.github.jaiimageio.impl.plugins.bmp.BMPImageReaderSpi.onRegistration(BMPImageReaderSpi.java:97) at java.desktop/javax.imageio.spi.SubRegistry.registerServiceProvider(ServiceRegistry.java:788) at java.desktop/javax.imageio.spi.ServiceRegistry.registerServiceProvider(ServiceRegistry.java:330) at java.desktop/javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:212) at java.desktop/javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:136) at java.desktop/javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:157) at java.desktop/javax.imageio.ImageIO.<clinit>(ImageIO.java:66) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.handleScreenshot(PhantomJSDocumentFetcher.java:889) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.fetchPhantomJSDocument(PhantomJSDocumentFetcher.java:842) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.fetchDocument(PhantomJSDocumentFetcher.java:773) at com.norconex.collector.http.pipeline.importer.DocumentFetcherStage.executeStage(DocumentFetcherStage.java:42) at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:31) at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:24) at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) at com.norconex.collector.http.crawler.HttpCrawler.executeImporterPipeline(HttpCrawler.java:360) at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:538) at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:419) at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:815) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) at java.base/java.lang.Thread.run(Thread.java:844) Caught and handled this exception : java.lang.NumberFormatException: For input string: "" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.base/java.lang.Integer.parseInt(Integer.java:662) at java.base/java.lang.Integer.parseInt(Integer.java:770) at com.github.jaiimageio.impl.common.ImageUtil.processOnRegistration(ImageUtil.java:1401) at com.github.jaiimageio.impl.plugins.wbmp.WBMPImageWriterSpi.onRegistration(WBMPImageWriterSpi.java:103) at java.desktop/javax.imageio.spi.SubRegistry.registerServiceProvider(ServiceRegistry.java:788) at java.desktop/javax.imageio.spi.ServiceRegistry.registerServiceProvider(ServiceRegistry.java:330) at java.desktop/javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:212) at java.desktop/javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:136) at java.desktop/javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:157) at java.desktop/javax.imageio.ImageIO.<clinit>(ImageIO.java:66) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.handleScreenshot(PhantomJSDocumentFetcher.java:889) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.fetchPhantomJSDocument(PhantomJSDocumentFetcher.java:842) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.fetchDocument(PhantomJSDocumentFetcher.java:773) at com.norconex.collector.http.pipeline.importer.DocumentFetcherStage.executeStage(DocumentFetcherStage.java:42) at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:31) at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:24) at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) at com.norconex.collector.http.crawler.HttpCrawler.executeImporterPipeline(HttpCrawler.java:360) at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:538) at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:419) at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:815) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) at java.base/java.lang.Thread.run(Thread.java:844) Caught and handled this exception : java.lang.NumberFormatException: For input string: "" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.base/java.lang.Integer.parseInt(Integer.java:662) at java.base/java.lang.Integer.parseInt(Integer.java:770) at com.github.jaiimageio.impl.common.ImageUtil.processOnRegistration(ImageUtil.java:1401) at com.github.jaiimageio.impl.plugins.bmp.BMPImageWriterSpi.onRegistration(BMPImageWriterSpi.java:105) at java.desktop/javax.imageio.spi.SubRegistry.registerServiceProvider(ServiceRegistry.java:788) at java.desktop/javax.imageio.spi.ServiceRegistry.registerServiceProvider(ServiceRegistry.java:330) at java.desktop/javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:212) at java.desktop/javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:136) at java.desktop/javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:157) at java.desktop/javax.imageio.ImageIO.<clinit>(ImageIO.java:66) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.handleScreenshot(PhantomJSDocumentFetcher.java:889) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.fetchPhantomJSDocument(PhantomJSDocumentFetcher.java:842) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.fetchDocument(PhantomJSDocumentFetcher.java:773) at com.norconex.collector.http.pipeline.importer.DocumentFetcherStage.executeStage(DocumentFetcherStage.java:42) at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:31) at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:24) at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) at com.norconex.collector.http.crawler.HttpCrawler.executeImporterPipeline(HttpCrawler.java:360) at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:538) at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:419) at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:815) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) at java.base/java.lang.Thread.run(Thread.java:844) Caught and handled this exception : java.lang.NumberFormatException: For input string: "" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.base/java.lang.Integer.parseInt(Integer.java:662) at java.base/java.lang.Integer.parseInt(Integer.java:770) at com.github.jaiimageio.impl.common.ImageUtil.processOnRegistration(ImageUtil.java:1401) at com.github.jaiimageio.impl.plugins.gif.GIFImageWriterSpi.onRegistration(GIFImageWriterSpi.java:140) at java.desktop/javax.imageio.spi.SubRegistry.registerServiceProvider(ServiceRegistry.java:788) at java.desktop/javax.imageio.spi.ServiceRegistry.registerServiceProvider(ServiceRegistry.java:330) at java.desktop/javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:212) at java.desktop/javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:136) at java.desktop/javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:157) at java.desktop/javax.imageio.ImageIO.<clinit>(ImageIO.java:66) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.handleScreenshot(PhantomJSDocumentFetcher.java:889) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.fetchPhantomJSDocument(PhantomJSDocumentFetcher.java:842) at com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher.fetchDocument(PhantomJSDocumentFetcher.java:773) at com.norconex.collector.http.pipeline.importer.DocumentFetcherStage.executeStage(DocumentFetcherStage.java:42) at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:31) at com.norconex.collector.http.pipeline.importer.AbstractImporterStage.execute(AbstractImporterStage.java:24) at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) at com.norconex.collector.http.crawler.HttpCrawler.executeImporterPipeline(HttpCrawler.java:360) at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:538) at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:419) at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:815) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) at java.base/java.lang.Thread.run(Thread.java:844)
I cannot reproduce. I tried with your config (the part you shared) and you can see from the attached file the dynamic content was extracted properly for https://loc.salon-expert.hu/#sal_1
:
2018-04-26T08-55-43-149_1.zip
Interesting, for me it still exception kicks in however I also noticed that on my mac when I run first time there is java window pops up and throw this message
/Library/Java/JavaVirtualMachines/jdk-9.0.4.jdk/Contents/Home/bin/java ; exit;
Teodors-MacBook-Pro:~ teoodordrobcsak$ /Library/Java/JavaVirtualMachines/jdk-9.0.4.jdk/Contents/Home/bin/java ; exit;
Usage: java [options] <mainclass> [args...]
(to execute a class)
or java [options] -jar <jarfile> [args...]
(to execute a jar file)
or java [options] -m <module>[/<mainclass>] [args...]
java [options] --module <module>[/<mainclass>] [args...]
(to execute the main class in a module)
Arguments following the main class, -jar <jarfile>, -m or --module
<module>/<mainclass> are passed as the arguments to main class.
where options include:
-d32 Deprecated, will be removed in a future release
-d64 Deprecated, will be removed in a future release
-cp <class search path of directories and zip/jar files>
-classpath <class search path of directories and zip/jar files>
--class-path <class search path of directories and zip/jar files>
A : separated list of directories, JAR archives,
and ZIP archives to search for class files.
-p <module path>
--module-path <module path>...
A : separated list of directories, each directory
is a directory of modules.
--upgrade-module-path <module path>...
A : separated list of directories, each directory
is a directory of modules that replace upgradeable
modules in the runtime image
--add-modules <module name>[,<module name>...]
root modules to resolve in addition to the initial module.
<module name> can also be ALL-DEFAULT, ALL-SYSTEM,
ALL-MODULE-PATH.
--list-modules
list observable modules and exit
-d <module name>
--describe-module <module name>
describe a module and exit
--dry-run create VM and load main class but do not execute main method.
The --dry-run option may be useful for validating the
command-line options such as the module system configuration.
--validate-modules
validate all modules and exit
The --validate-modules option may be useful for finding
conflicts and other errors with modules on the module path.
-D<name>=<value>
set a system property
-verbose:[class|module|gc|jni]
enable verbose output
-version print product version to the error stream and exit
--version print product version to the output stream and exit
-showversion print product version to the error stream and continue
--show-version
print product version to the output stream and continue
--show-module-resolution
show module resolution output during startup
-? -h -help
print this help message to the error stream
--help print this help message to the output stream
-X print help on extra options to the error stream
--help-extra print help on extra options to the output stream
-ea[:<packagename>...|:<classname>]
-enableassertions[:<packagename>...|:<classname>]
enable assertions with specified granularity
-da[:<packagename>...|:<classname>]
-disableassertions[:<packagename>...|:<classname>]
disable assertions with specified granularity
-esa | -enablesystemassertions
enable system assertions
-dsa | -disablesystemassertions
disable system assertions
-agentlib:<libname>[=<options>]
load native agent library <libname>, e.g. -agentlib:jdwp
see also -agentlib:jdwp=help
-agentpath:<pathname>[=<options>]
load native agent library by full pathname
-javaagent:<jarpath>[=<options>]
load Java programming language agent, see java.lang.instrument
-splash:<imagepath>
show splash screen with specified image
HiDPI scaled images are automatically supported and used
if available. The unscaled image filename, e.g. image.ext,
should always be passed as the argument to the -splash option.
The most appropriate scaled image provided will be picked up
automatically.
See the SplashScreen API documentation for more information
@argument files
one or more argument files containing options
-disable-@files
prevent further argument file expansion
To specify an argument for a long option, you can use --<name>=<value> or
--<name> <value>.
logout
Saving session...
...copying shared history...
...saving history...truncating history files...
...completed.
Deleting expired sessions...59 completed.
I was wondering if this is some Java compatibility issue in here?
BTW thanks I was able to capture the dynimically created content with this SNAPSHOT release, dispite this exception throw..
Given I cannot reproduce but you get the content now, shall we close?
The java error you are getting is when starting the HTTP Collector with the launch shell script? If so, maybe it needs to be modified to run on your mac? Have you tried with Java 8? I wonder if it may be something different with Java 9.
Closing for lack of feedback. Feel free to reopen if the problem persists and you have more details.
Hi
I would like to extract experts contact information from a site which dynamically generates list of available experts.
I saved these dynamically created sites into webpages-list containing following urls https://loc.salon-expert.hu/#sal_1 https://loc.salon-expert.hu/#sal_2 https://loc.salon-expert.hu/#sal_3 https://loc.salon-expert.hu/#sal_4 https://loc.salon-expert.hu/#sal_5 https://loc.salon-expert.hu/#sal_6 https://loc.salon-expert.hu/#sal_7 https://loc.salon-expert.hu/#sal_8 https://loc.salon-expert.hu/#sal_9
Can you help me understand what I'm doing wrong ? Here is my http collector's config.xml, however in the end result collector doesn't walk though out list of sites I collected in above file, thus it won't fetch any information as it stops by fetching https://loc.salon-expert.hu/ content.