Closed OkkeKlein closed 8 years ago
The unit tests for the project with Mongo are successful, so I wonder if you have a version mismatch.
Which versions of the Collector HTTP and Collector Core are you using? They should be 2.5.0 and 1.5.0 respectively (both snapshots).
You can check the jar file names to find out in your lib folder, or you can copy the first few lines of your log that prints the versions (if set).
[non-job]: 2016-04-20 14:50:19 INFO - Version: Norconex HTTP Collector 2.5.0-SNAPSHOT (Norconex Inc.) [non-job]: 2016-04-20 14:50:19 INFO - Version: Norconex Collector Core 1.5.0-SNAPSHOT (Norconex Inc.) [non-job]: 2016-04-20 14:50:19 INFO - Version: Norconex Importer 2.5.2-SNAPSHOT (Norconex Inc.) [non-job]: 2016-04-20 14:50:19 INFO - Version: Norconex JEF 4.0.7 (Norconex Inc.) [non-job]: 2016-04-20 14:50:19 INFO - Version: Norconex Committer Core 2.0.3 (Norconex Inc.)
Strange. It is just that content type? I will look at serializing that object a different way and will update you.
Can you try by replacing the collector-core jar with the latest snapshot. You can download just the jar here.
NOTE: you will need to perform a clean crawl (delete your crawl store) because this fix changes how the ContentType gets serialized for Mongo and will not be compatible.
Just did test on html content. Same problem.
BTW download link to collector-core snapshot is showing 404.
That one works.
You mean the link or the fix? :-) Can I close?
Yes. You can close.
My Crawler Name: 2016-04-20 14:41:50 ERROR - My Crawler Name: Could not mark reference as processed: URL (can't serialize class com.norconex.commons.lang.file.ContentType) java.lang.IllegalArgumentException: can't serialize class com.norconex.commons.lang.file.ContentType at org.bson.BasicBSONEncoder._putObjectField(BasicBSONEncoder.java:299) at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:194) at org.bson.BasicBSONEncoder._putObjectField(BasicBSONEncoder.java:255) at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:194) at org.bson.BasicBSONEncoder.putObject(BasicBSONEncoder.java:136) at com.mongodb.DefaultDBEncoder.writeObject(DefaultDBEncoder.java:36) at com.mongodb.BSONBinaryWriter.encodeDocument(BSONBinaryWriter.java:339) at com.mongodb.UpdateCommandMessage.writeTheWrites(UpdateCommandMessage.java:48) at com.mongodb.UpdateCommandMessage.writeTheWrites(UpdateCommandMessage.java:23) at com.mongodb.BaseWriteCommandMessage.encodeMessageBody(BaseWriteCommandMessage.java:69) at com.mongodb.BaseWriteCommandMessage.encodeMessageBody(BaseWriteCommandMessage.java:23) at com.mongodb.RequestMessage.encode(RequestMessage.java:66) at com.mongodb.BaseWriteCommandMessage.encode(BaseWriteCommandMessage.java:53) at com.mongodb.DBCollectionImpl.sendWriteCommandMessage(DBCollectionImpl.java:520) at com.mongodb.DBCollectionImpl.access$200(DBCollectionImpl.java:48) at com.mongodb.DBCollectionImpl$2.execute(DBCollectionImpl.java:470) at com.mongodb.DBCollectionImpl$2.execute(DBCollectionImpl.java:461) at com.mongodb.DBPort.doOperation(DBPort.java:187) at com.mongodb.DBTCPConnector.doOperation(DBTCPConnector.java:208) at com.mongodb.DBCollectionImpl.writeWithCommandProtocol(DBCollectionImpl.java:461) at com.mongodb.DBCollectionImpl.updateWithCommandProtocol(DBCollectionImpl.java:456) at com.mongodb.DBCollectionImpl.update(DBCollectionImpl.java:270) at com.mongodb.DBCollection.update(DBCollection.java:214) at com.mongodb.DBCollection.update(DBCollection.java:247) at com.norconex.collector.core.data.store.impl.mongo.MongoCrawlDataStore.processed(MongoCrawlDataStore.java:203) at com.norconex.collector.core.crawler.AbstractCrawler.finalizeDocumentProcessing(AbstractCrawler.java:636) at com.norconex.collector.core.crawler.AbstractCrawler.processImportResponse(AbstractCrawler.java:544) at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:491) at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:377) at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:735) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)