Closed GoogleCodeExporter closed 9 years ago
Original comment by daxenber...@gmail.com
on 11 Jun 2014 at 5:25
I see the problem, but I do not think that FEs should declare specific
components as requirements.
For the short-term, some documentation would be useful.
For the medium term, it would be good if TC could scan the classpath for
components that produce the desired annotations and to suggest them to the user
(e.g. in an error message).
Let's discuss the long term offline.
Original comment by richard.eckart
on 11 Jun 2014 at 6:02
the kind of documentation that Emily suggests is essential - becomes even more
important, if we start to use lexical resources during feature extraction.
Original comment by eckle.kohler
on 11 Jun 2014 at 6:15
The best we currently have is this:
https://code.google.com/p/dkpro-core-asl/wiki/ComponentList
Original comment by richard.eckart
on 11 Jun 2014 at 6:17
I agree with Richard that TC is probably not the place to document examples of
components that might get outdated anyway.
I see that as an instance of the more general problem that also students have
when starting to use DKPro Core that it is not very clear which components
create which annotations.
Original comment by torsten....@gmail.com
on 11 Jun 2014 at 7:54
Regarding documentation and whether it should be distributed across TC FE's or
centralized in, say, a chart in the google Wiki of DKPro Core:
In the case that DKPro Core components change and that a particular
preprocessing component that a FE relies on could be removed from DKPro Core,
rendering the FE unusable, it might take longer to track down the unusability
of the FE with centralized documentation versus distributed documentation.
With distributed documentation, I could open the FE, see the suggested
preprocessing, look at an error message or Core and notice that one component
is no longer supported, realize that there is no other component that will
work, and flag the FE as buggy. If documentation is only centralized, I might
think, there's got to be some combination of components that works but I can't
see what it is...
Other benefits of distributed documentation:
-When a developer adds a FE to TC, should they also have to be a developer on
Core so they can update the centralized documentation there for the TC FE?
-Should Core be required to host documentation specific to the needs of TC FE's?
-Some FE's have trivial preprocessing needs such as Tokenization, but as time
passes, we are seeing a wonderfully diverse library of FE's contributed to TC;
for some FE with preprocessing needs like semantic dependency parsing of NE's,
won't the explanation of a necessary preprocessing pipeline be too unwieldy to
centralize?
Of course, distributed documentation has the drawback that it is more effort to
keep updated.
Original comment by EmilyKJa...@gmail.com
on 11 Jun 2014 at 10:41
I cannot follow your argumentation. I don't understand what FE-specific
documentation would be kept in DKPro Core.
We have two questions:
a) DKPro Core documentation should be able to answer the question "which
component produces annotation type X".
b) DKPro TC documentation should answer the question "which annotation type is
required by FE X".
It appears to me to be a pretty clean separation of concerns.
I thought that one goal of TC was to remove the need for the user to answer
these questions by automatically adding the required preprocessing when an FE
is added.
So to start with, having answers for both questions cleanly separated in Core
and TC makes sense to me. Eventually, though, it would be nice if work on the
mail goal could proceed: removing the need to answer these questions.
Your initial suggestion to let the FE JavaDoc suggest components directly goes
into that direction. But that's again just documentation. How about defining an
automatic solution. A simplistic start could be a configuration file in DKPro
TC that maps each type to an analysis engine, e.g.
...Sentence=...OpenNlpSegmenter
...Token=...OpenNlpSegmenter
...Dependency=...MaltParser
TC could use this information to add the respective analysis engines to the
preprocessing step. I'm sure you can see immediately, that there are pitfalls
in this process. We'd need to see how far we can get with such a simple
solution before banging our heads against the wall. In the worst case, TC could
use this information simply to construct an error message to display to the
user.
Original comment by richard.eckart
on 11 Jun 2014 at 10:56
Thanks Richard, for your helpful points in multiple directions.
In my previous post, I overlooked the fact that, while a particular FE's
*combination* of preprocessing types may be unique, each type must be
pre-existing in DKPro Core unless a Core developer adds a new one. So you're
right, there's no need to update Core documentation for each new TC FE.
I think this Issue has outgrown itself, so I am closing it for now and adding
an item to the TC Meeting agenda for further discussion.
Original comment by EmilyKJa...@gmail.com
on 11 Jun 2014 at 11:33
Keeping the discussion alive for now, as I like the points Richard has raised
about ease of use for TC users.
It would be great if there was no need for users to specify which preprocessing
components to use. There should be a sensible default and the possibility to
override the default.
Unfortunately, the Type2Component mapping is rather language-specific (e.g.
specialised parsers for some languages), but of course there are components
that just support a wider range of languages and would make a good default
(e.g. Stanford).
As Richard said, an error message could be generated if the defaults yield no
usable result for some input.
Original comment by torsten....@gmail.com
on 12 Jun 2014 at 3:31
Original issue reported on code.google.com by
EmilyKJa...@gmail.com
on 11 Jun 2014 at 5:21