Closed danizen closed 7 years ago
If you are talking about loading them via XML configuration, then you can assume a new instance will be created upon starting for each entry in your config, and they will each live until your crawler dies (they are not recreated each time they are accessed). Importer handlers are built with thread-safety in mind.
If you do it via coding, you control what the instances are and how they are shared.
A suggestion: you can probably have a connection pool accessible via singleton. You can initialize it and destroy it using a Collector listener or crawler listener, depending on the scope you want to give it (assuming you use a collector).
Does that answer?
I have already the singleton, and connecting it to a crawler listener is a good idea. I will do that. This probably means the singleton will grow into a connection manager and data access object (DAO). That's probably better that spreading it all over.
Will the tagger object be instantiated again and again throughout the crawl, or only once at the start and end of the crawl? Or, is the life-cycle more complicated?
Background - I am writing a couple of taggers, and I am no longer attempting to generalize to a Jdbc tagger or general REST API tagger, but instead I'm writing very specific taggers to do specific things, sometimes with an RDBMS and sometimes with a REST interface. I need to know how to manage Connection, PreparedStatement, and other durable objects so that they can be used efficiently.