Closed torhar closed 10 years ago
and in Source.java srcData.get(Name) seems to be a Long, so
protected int getSrcDataInt(String name) { if (srcData.containsKey(name)) return ((Integer)srcData.get(name)).intValue(); String value = getSrcDataString(name); //.replace(".0", ""); return Integer.parseInt(value); }
must be
protected int getSrcDataInt(String name) { if (srcData.containsKey(name)) return ((Long)srcData.get(name)).intValue(); String value = getSrcDataString(name); //.replace(".0", ""); return Integer.parseInt(value); }
What is the issue with the Long to Integer cast ? Is there an issue with one of the source setting parameter ? I don't think so !
at least one source parameter is a Long, so cast to Integer fails, maybe only in our environment
Can you provide the xml export of your source setting (export function) ?
this issue got the label "bug", do you still need xml export of the source to investigate Long/Inter cast of Source.java?
Yes please.
(See attached file: 525288c536c04.xml)
Please send the file to contact@crawl-anywhere.com
Hi,
I don't reproduce these issues even with your export. Can you provide the exact scenario in order to each of these 2 issue ? Which source parameter is a long ? Which version of mongodb are you using ? Is it a 64 bits version ?
Regards.
$ mongod --version db version v2.2.3, pdfile version 4.5 Fri Oct 18 11:38:20 git version: nogitversion
64 bit
Fri Oct 18 13:29:11 CEST 2013 - ================================= Fri Oct 18 13:29:11 CEST 2013 - Crawler starting (version: 4.0.0) Fri Oct 18 13:29:11 CEST 2013 - Simultaneous sources crawled : 3 Fri Oct 18 13:29:11 CEST 2013 - account : 1 Fri Oct 18 13:29:11 CEST 2013 - Fri Oct 18 13:29:11 CEST 2013 - ================================= Fri Oct 18 13:29:11 CEST 2013 - Fri Oct 18 13:29:11 CEST 2013 - Sources to be crawled : 1 Fri Oct 18 13:29:11 CEST 2013 - Pushing source : 4 Fri Oct 18 13:29:11 CEST 2013 - Source data key-name: id_target Fri Oct 18 13:29:11 CEST 2013 - Source data key-class: class java.lang.Long Fri Oct 18 13:29:11 CEST 2013 - java.lang.Long cannot be cast to java.lang.Integer Fri Oct 18 13:29:11 CEST 2013 - >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fri Oct 18 13:29:11 CEST 2013 - >>>> Error = java.lang.Long cannot be cast to java.lang.String Fri Oct 18 13:29:11 CEST 2013 - = java.lang.Thread.run(Thread.java:662) Fri Oct 18 13:29:11 CEST 2013 - fr.eolya.crawler.connectors.Source.getSrcDataString(Source.java:142) Fri Oct 18 13:29:11 CEST 2013 - fr.eolya.crawler.connectors.Source.getSrcDataInt(Source.java:124) Fri Oct 18 13:29:11 CEST 2013 - fr.eolya.crawler.connectors.Source.getTargetId(Source.java:205) Fri Oct 18 13:29:11 CEST 2013 - fr.eolya.crawler.connectors.Connector.initializeInternal(Connector.java:50) Fri Oct 18 13:29:11 CEST 2013 - fr.eolya.crawler.connectors.web.WebConnector.initialize(WebConnector.java:79) Fri Oct 18 13:29:11 CEST 2013 - fr.eolya.crawler.ProcessorSource.call(ProcessorSource.java:55) Fri Oct 18 13:29:11 CEST 2013 - fr.eolya.crawler.ProcessorSource.call(ProcessorSource.java:20) Fri Oct 18 13:29:11 CEST 2013 - java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) Fri Oct 18 13:29:11 CEST 2013 - java.util.concurrent.FutureTask.run(FutureTask.java:138) Fri Oct 18 13:29:11 CEST 2013 - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) Fri Oct 18 13:29:11 CEST 2013 - java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) Fri Oct 18 13:29:11 CEST 2013 - java.lang.Thread.run(Thread.java:662)
log produces with following code-snippet:
try { if (srcData.containsKey(name)) return ((Integer)srcData.get(name)).intValue(); } catch(Exception e) { logger.log("Source data key-name: " +name); logger.log("Source data key-class: " +srcData.get(name).getClass()); logger.log(e.getMessage()); }
Hi, Thank you for this trace. Did you setup something specific about target ? Did you created a target ? Did you change the target for your source ? Dominique
Please send me by email your file source.java.
I tried various things, but it is still impossible to reproduce. Can you provide an export of your mongodb database (without the pages* collections) ?
Hello,
We have the same problem. Installation seems to be fine, and we entered our sources (~140), but crawling never starts, with the cast Exception mentionned in this issue.
Did you find any solution/workaround ?
Thanks
Can you provide me a mongodb export ?
Thanks for your quick answer. I just sent the export to conact at crawl-anywhere.com.
Fixed
the first crawl date of an item is null when i try to rescan a source. this leads to a numberformatexception in WebConnector.java
a quick solutuion would be to check for
if (firstCrawlDate == null || "".equals(firstCrawlDate) ) {
but i don't know if it is correct in this situation that the firstCrawlDate is nuli.