Norconex / collector-core

Collector-related code shared between different collector implementations
http://www.norconex.com/collectors/collector-core/
Apache License 2.0
7 stars 15 forks source link

Enhancement request - action to dump xml output from ConfigurationLoader #17

Open danizen opened 6 years ago

danizen commented 6 years ago

Apache Velocity is a much easier template engine than many others, but there is still the issue of whether this works:

      <crawlDataStoreFactory class="com.norconex.collector.http.data.store.impl.mongo.MongoCrawlDataStoreFactory">
        <host>$mongo_host</host>
        <dbname>$mongo_dbname</dbname>
        <username>$mongo_username</username>
        <password>$mongo_password</password>
        <mechanism>SCRAM-SHA-1</mechanism>
        #if( $mongo_cached )<cachedCollectionName>$mongo_cached</cachedCollectionName>#endif
        #if ( $mongo_refs )<referencesCollectionName>$mongo_refs</referencesCollectionName>#endif
      </crawlDataStoreFactory>```

Indeed it does not, because the end for directives is #end, but it required some work to figure out what was going on.

essiembre commented 6 years ago

You mean as a command-line option? It has been in the HTTP Collector project internal TODO file for some time now. :-) Could definitely be useful. Marking it.

danizen commented 6 years ago

In my implementation, I had the code dump the configuration after loadString, but before conversion to XMLConfiguration. Because of this, I had to re-implement your regular expression thing to strip out DOCTYPE and xml declaration. As a feature, it could even be a dump out of the pretty printed XMLConfiguration after loading as an CollectorConfig with whichever collector contains main() ...

What I'm saying is that I'm not sure what level of validation should be done, before dump out.