Closed HappyCustomers closed 3 years ago
Your config snippet does not work? It will create an embedded H2 instance. The location will be in a folder in your "workdir" called something like "crawlstore-jdbc".
To read mvstore, I have always relied on Java. There are examples here: http://www.h2database.com/html/mvstore.html#example_code, but you can also look at the Collector source code for how it reads it.
Is this to track processed URLs in a database? If so you can consider using the SQL Committer. You can use it in combination with the MultiCommitter if you want to send URLs to both a database and something else.
Hi Essiembre,
Thanks for the quick response.
I need to track the start URLs for a given configuration file. As the URLS are coming from a input file there are chances of an URL getting deleted from the file. Before I rerun the collector again, I need to verify that the starturl
list with processed list URLs so they remain same.
When I set crawlDataStoreFactory
to JDBC, the collector is creating a file db.mv.db
.
<crawlDataStoreFactory
class="com.norconex.collector.http.data.store.impl.jdbc.JDBCCrawlDataStoreFactory"/>
How do I read this DB file? if I can read this DB file i will be able to fetch all the starturls
for a particular config file. Does this database has a username and password? what tool I need to read this database file?
Hi Essiembre,
I am able to connect to the H2 database and view the processed URLs using H2 Console. Is there a similar tool to view data in mvstore database ?
Thank you
Sorry, I meant MvStore in my previous comment. I have updated it:
To read mvstore, I have always relied on Java. There are examples here: http://www.h2database.com/html/mvstore.html#example_code, but you can also look at the Collector source code for how it reads it.
Just to add, if your goal is to produce some kind of a report of what was crawled, you can also look at URLStatusCrawlerEventListener. It can also track the HTTP response code, which can be useful to find broken links.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I need help in configuring the
crawlDataStoreFactory
for JDBC instead of default mvstore.I need help in connecting to H2 and store the progress instead of default mvstore(H2). OR which is the tool to read the mvstore file?