Norconex / crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
https://opensource.norconex.com/crawlers
Apache License 2.0
183 stars 67 forks source link

Information for the log file #170

Closed csaezl closed 9 years ago

csaezl commented 9 years ago

At the beginning of the collector run, in the console, I get the information shown below, related to filters and modules versions:

 INFO  [AbstractCrawlerConfig] Reference filter loaded: com.norconex.collector.co
 re.filter.impl.RegexReferenceFilter@12405818[onMatch=EXCLUDE,caseSensitive=false
 ,pattern=.*year=.*,regex=.*year=.*]
 INFO  [AbstractCrawlerConfig] Reference filter loaded: com.norconex.collector.co
 re.filter.impl.RegexReferenceFilter@453da22c[onMatch=EXCLUDE,caseSensitive=false
 ,pattern=.*fecha=.*,regex=.*fecha=.*]
 INFO  [AbstractCollectorConfig] Configuration loaded: id=url-juventudextremadura
 .gobex.es_(collector); logsDir=C:/CRAWLER-xxxxxxx/collectors/url-juventudextrema
 dura.gobex.es/work//log; progressDir=C:/CRAWLER-xxxxxxx/collectors/url-juventude
 xtremadura.gobex.es/work//progress
 INFO  [AbstractCollector] Version: Norconex HTTP Collector 2.3.0-SNAPSHOT (Norco
 nex Inc.)
 INFO  [AbstractCollector] Version: Norconex Collector Core 1.3.0-SNAPSHOT (Norco
 nex Inc.)
 INFO  [AbstractCollector] Version: Norconex Importer 2.4.0-SNAPSHOT (Norconex In
 c.)
 INFO  [AbstractCollector] Version: Norconex JEF 4.0.7-SNAPSHOT (Norconex Inc.)
 INFO  [AbstractCollector] Version: Norconex Committer Core 2.0.2 (Norconex Inc.)

Is there a way of get this information written to the log file?

essiembre commented 9 years ago

Some of that info cannot be stored in the log file since the log file location is defined in the collector configuration file, and those log entries are printed before (or while) the configuration file is parsed.

The version information is not tied to the configuration file so I made the change and they are now written to file as well (in the latest snapshot).