Closed AntonioAmore closed 10 years ago
Hello, First, congrats in writing your own Committer! If you feel it can be generic enough when you are done, I can link to it as a third party contribution if you like.
Now some answers:
0 - Is using AbstractMappedCommitter
the right decision? It depends. What that class those is take care of document queuing for you so you can commit in batch, and the "Mapped" in the name means it offers configuration options to map the ID and content fields from crawled document metadata to the ID and content field in your target repo (MySQL in your case). If you do not care about submitting in batches or you do your own metadata-fields-to-table-field associations in the code, you can opt for implementing the ICommitter
interface directly, where in the queueXXX
methods you insert into MySQL, and in the commit()
method, you commit the database (just a suggestion).
ICommitter
to push directly to MySQL as documents are ready (putting the batch size to zero in AbstractMappedCommitter
may also do this I suppose).KeepOnlyTagger
in your config, all extracted metadata is kept and available in the Properties metadata variable passed to the committer methods. What I recommend you do, is write yourself a bit of code that will print the content of that variable, or temporarily use the FileSystemCommitter
and open up a generated *.meta file to get a list of all fields attached to your documents. You can then pick and chose, and even rename or manipulate these fields before sending (either using existing taggers, transformers, etc in your config or programmatically).<commitBatchSize>
(max number of documents to send to target repository at once)
</commitBatchSize>
<queueSize>
(max queue size before committing)
</queueSize>
Please let me know if that answers your question or if you have more. Thanks.
Thanks a lot for your answer.
I've chosen the AbstractMappedCommiter because really want to map metadata to different fields of database table and make it highly configurable. Tell me please
for (ICommitOperation iCommitOperation : list) {
if (iCommitOperation instanceof IAddOperation) {
/*
*
*/
} else if (iCommitOperation instanceof IDeleteOperation) {
/*
*
*/
} else {
throw new CommitterException("Unsupported operation:" + iCommitOperation);
}
}
I viewed metadata files and may recognize fields names there, but still unable to get how to read metadata, or mapped fields inside the function to write MySQL query. Could you provide a link to a file from the project which may be used by me as an example. IAddOperation methods dont helped me. Sorry for asking such elementary things.
Don't be sorry for asking!
IAddOperation
or IDeleteOperation
. The IAddOperation
has a getMetadata()
method on it. It returns a Properties object, which holds all the fields it detected so far. So you would retrieve them by calling get methods on that Properties file, such as getString("myTextFieldName")
, or getInt("myNumericFieldName")
. If you call the keySet()
method on it, you will get all metadata field names present.Any clearer?
Thank you!
That's clear, typecasting to more specialized interfaces like IAddOperation works and shows me the picture.
And it's sad that I have not enough experience now to provide my commiter to community - it still too special. Hope in the future I can program it on more generic manner.
No worries, you have to start somewhere and you seem on the right track. Keep it up! :-)
Hello!
I write my own committer implementation to put collected pages into MySQL database.
As an example I've taken SolrCommiter - is it a right decision?
So I inherited from AbstractMappedCommitter and implementing commitBatch(List list) method get following questions:
I tried to handle it myself, but sink in code - not experienced in Java yet. Thank you a lot.