Closed GoogleCodeExporter closed 9 years ago
We agreed to go with this approach at the ClearTK day today.
Original comment by steven.b...@gmail.com
on 12 Feb 2012 at 8:20
Ok, one issue I'm running into with this is that it's no longer so easy for
CleartkAnnotator to decide if it's training or predicting.
In the past, there was always a default classifier factory
(JarClassifierFactory), but there was no default data writer factory. So we
could tell if we were training by looking to see if a data writer factory had
been specified.
With the approach in this issue, there will now always be both a default
classifier factory (still JarClassifierFactory) and a default data writer
factory (DefaultDataWriterFactory, or whatever we call it). So the old
heuristic for guessing whether we were training or not will now fail.
I see a few solutions:
(1) Always force people to specify PARAM_IS_TRAINING. This would mean every
creation of a CleartkAnnotator would require specifying an additional
configuration parameter. This parameter would be conceptually redundant with
the fact that you're specifying either
JarClassifierFactory.PARAM_CLASSIFIER_JAR_PATH or
DefaultDataWriterFactory.PARAM_DATA_WRITER_CLASS_NAME. But it wouldn't be
technically redundant, because these are "implementation details" of
JarClassifierFactory and DefaultDataWriterFactory that CleartkAnnotator doesn't
necessarily know about.
(2) Have CleartkAnnotator check for the presence of
JarClassifierFactory.PARAM_CLASSIFIER_JAR_PATH or
DefaultDataWriterFactory.PARAM_DATA_WRITER_CLASS_NAME. This would keep user
code simple, but would add special cases to CleartkAnnotator specifically for
JarClassifierFactory and DefaultDataWriterFactory. (Of course, these two are
what 99.9% of people are going to be using, so maybe it makes sense to special
case them.)
(3) Don't create a DefaultDataWriterFactory, and instead have CleartkAnnotator
itself take the PARAM_DATA_WRITER_CLASS_NAME. Then if either a
DataWriterFactory or a DataWriter was specified, we'd know that we were
training. But if we merge the DefaultDataWriterFactory functionality into
CleartkAnnotator, for symmetry, it seems like we'd also want to merge the
JarClassifierFactory functionality in there too.
Right now, I'm leaning towards (2) because, though (1) is probably the purest
approach, (2) seems to be much more practical, and doesn't couple the factories
with CleartkAnnotator like (3) would.
Original comment by steven.b...@gmail.com
on 24 Apr 2012 at 3:29
My first reaction is to recommend (1). While your argument for (2) is true now
- it does not seem unlikely that it will not be true in the future. I can
imagine classifiers and data writers implemented in completely different ways
outside of our current data-writer-to-file / classifier-from-jar paradigm. For
example, a data writer might be implemented as a client that sends messages
(i.e. instances) to a server that is continuously training a model. Something
like that would probably not be handled by these params. Also, is it really
that onerous to set a single boolean parameter? It might make the code
clearer....
That said, as things are now - (2) probably makes the most sense. We could
circle back to this issue when we need to and make a change then as necessary.
That's generally been our approach in the past.
Original comment by phi...@ogren.info
on 25 Apr 2012 at 3:41
Note that (2) doesn't prevent you from specifying PARAM_IS_TRAINING as in (1).
So if you want to be explicit, you can already do so, and you can do so with
(2).
> We could circle back to this issue when we need to and make a change then as
necessary.
Yep. If we really feel like everyone should be specifying PARAM_IS_TRAINING
(and we want to stop inferring it automatically), we can issue deprecation
warnings for any path but the explicit PARAM_IS_TRAINING path.
Ok, I'll go ahead with (2).
Original comment by steven.b...@gmail.com
on 25 Apr 2012 at 8:51
This issue was closed by revision r3895.
Original comment by steven.b...@gmail.com
on 25 Apr 2012 at 2:05
Original comment by steven.b...@gmail.com
on 5 Aug 2012 at 8:50
Original issue reported on code.google.com by
steven.b...@gmail.com
on 7 Feb 2012 at 7:19