Closed bkisselbach closed 3 years ago
You would have to create your own version of JDBCCrawlDataStoreFactory for your database. Can you elaborate on your needs? If your goal is to have access to your crawl data from Postgres, I would encourage you to use the SQLCommitter instead, which is meant to work with any relational database.
We are trying to get a better handle on the content flowing through to a SQL committer and were curious if we could push the norconex logs into the same database so we can get a better insight into what had been crawled, status etc
I suggest you have a look at using a URLStatusCrawlerEventListener.
You could also use the MultiCommitter
and specify a JSONFileCommitter or XMLFileCommiter in addition to your SQL one. Those keep an ongoing copy of additions and deletions. They do not overwrite previously written files, so it it can keep a full history of what you have committed if you like.
Can one of these work for you?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Is it possible to use a non-H2 database with the JDBCCrawlDataStoreFactory? The documentation at https://norconex.com/collectors/collector-http/latest/apidocs/com/norconex/collector/http/data/store/impl/jdbc/JDBCCrawlDataStoreFactory.html doesn't seem to indicate any configuration options