Norconex / collector-filesystem

Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to network locations into various data repositories such as search engines.
http://www.norconex.com/collectors/collector-filesystem/
22 stars 13 forks source link

Writing to XML from crawler output crashes #29

Closed pkasson closed 6 years ago

pkasson commented 6 years ago

Tried this:

OutputStreamWriter osw = new OutputStreamWriter(new FileOutputStream(outputPath), Charset.forName("UTF-8").newEncoder());

After crawler stops, tried saving ...

FilesystemCollector collector = new FilesystemCollector(collectorConfig) collector.start(true); crawlerConfig.saveToXML(osw);

kaboom -

Exception in thread "MyFilesystemCrawler" java.lang.NullPointerException at com.norconex.collector.fs.crawler.FilesystemCrawlerConfig.saveCrawlerConfigToXML(FilesystemCrawlerConfig.java:212) at com.norconex.collector.core.crawler.AbstractCrawlerConfig.saveToXML(AbstractCrawlerConfig.java:301) at com.kasstek.crawler.Test.startCrawling(Test.java:42)

How do you create an output file to capture the results ?

Thanks !

essiembre commented 6 years ago

The NPE will be fixed, but what do you mean by "capture the results"? If you are looking to store the crawled content, you need to use a Committer.

pkasson commented 6 years ago

Yes, store the content ... I will take a look at the Committer. Thanks.

essiembre commented 6 years ago

The latest snapshot release now has a fix for the NullPointerException when calling saveToXML(...). Please confirm.

pkasson commented 6 years ago

What repo do I pull the snapshot from ?

essiembre commented 6 years ago

Maven? It is indicated on the download page: http://www.norconex.com/collectors/collector-filesystem/download

pkasson commented 6 years ago

Confirmed, saved to XML, no crash. Thanks !

essiembre commented 6 years ago

Thanks for confirming.