Open GoogleCodeExporter opened 9 years ago
Why do we need an uber-jar?
Original comment by richard.eckart
on 7 Mar 2015 at 5:24
It would be nice to make some things to an actual "product" and provide a
runnable tool (aka uber-jar) that just works if you give it some input.
We would like to release an own PoS-Tagger for social media in the future -
without expecting that the user is a programmer that knows Eclipse, TC and all
that stuff.
It is a bit of work, but if you can release your research as standalone
runnable product with all dependencies and stuff being take care of - it would
be quite an advantage for TC.
Original comment by Tobias.H...@gmail.com
on 7 Mar 2015 at 5:30
As a developer of DKPro Core, I'm interested in tools that can be integrated
easily into UIMA pipelines. Uber-JARs are notoriously problematic because they
are highly likely to conflict with other classes on the classpath. Thus, I'd be
more interested something a step short of an uber-jar: a model and (if
necessary) a "light" JARs that I can wrap and integrated as an UIMA component -
or that already is a DKPro compatible UIMA component and can be added to a
pipeline directly.
I suppose based on that, I could always use the Maven shade plugin to create an
uber-jar if I wanted. We did that already with DKPro Core pipelines.
Alternatively, I could build a Groovy script that downloads the stuff from
repositories.
Original comment by richard.eckart
on 7 Mar 2015 at 5:35
Yes, we already have a model loading pipeline feature for created model files.
If you create an uber-jar from such a pipeline that loads a model it rains
"file-path not found exception" because all paths are absolute which is the key
problem if you try to use them from within the jar file.
I have only used the uber-jars so far, I thought it is the only way to make
something "runnable" outside of Eclipse?
Including the resources by downloading them is a possibility, but this sounds
like more effort to me. I thought about something were you provide once all the
stuff, prepare things for wrapping into the uber-jar and then just be done.
I am not sure how severe the dependency clash problem for ueber-jar are.
Original comment by Tobias.H...@gmail.com
on 7 Mar 2015 at 6:01
uber-jar clashes are hell. E.g. we could not make use of the TWSI library that
was provided as a uber-jar in any pipeline together with the Stanford tools,
because TWSI included a copy of the Stanford classes.
Uber-jars are nice for stand-alone applications - but they are completely
unusable within larger pipelines.
Of course models should not use absolute paths, at least not absolute paths
with respect to the file system. You can use absolute paths within the
classpath - that is what DKPro Core is doing all the time. It just needs to be
made sure that a proper package structure is used - i.e. that models are not
simply stored e.g. as "models/en.bin" in a JAR because that will also cause
clashes.
The "export as JAR" in Eclipse never worked well for me because of the way that
uimaFIT handles type detection.
For this reason, I am building runnable JARs using the maven-shade-plugin:
http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimaf
it.packaging
Also nice are the Groovy scripts that we have in DKPro Core. They use Maven
dependencies for components and the DKPro Core model auto-loading mechanism for
loading models (although models could equally be added as Maven dependencies to
the scripts - we just do not do it because it saves some lines of code):
https://code.google.com/p/dkpro-core-asl/wiki/DraftGroovyIntro
Such scripts as not as fully standalone as a uber-jar because they still
require that they can access a Maven repository, but on the other hand they are
really nice and short and serve as good examples.
I think it would be good to solve the file loading problems before taking the
next step of creating an uber-jar - and for creating an uber-jar I would
strongly recommend the approach described in the uimaFIT documentation
mentioned above.
What do you think about setting up a wiki page to write up a specification
mentioning the requirements and envisioned solutions? While discussing here, we
might easily loose track of what we actually want, why, and how we imagine to
solve it.
Original comment by richard.eckart
on 8 Mar 2015 at 8:32
I think I meant the maven-shaded way when I was talking about Ueber-jar,
apparently not the same ?
The other Wiki pages are all of the kind "how to use TC". I see no discussion
pages for work in progress?
Original comment by Tobias.H...@gmail.com
on 8 Mar 2015 at 12:18
There are various ways of building uber-jars. The maven-shade-plugin is special
in the sense that it can be configured to properly handle (merge) certain
configuration files that reside in well-known places in the classpath. Other
uber-jar builders tend to fail to handle such files properly.
Just create a new wiki page ;) Back in the olden days, when I started
implementing the resource resolving mechanism in DKPro Core, I set up such a
page in the DKPro Core wiki [1]. This page eventually turned into the seed for
documentation on resource packaging in DKPro Core, but the requirements
collected are still clearly visible.
[1] https://code.google.com/p/dkpro-core-asl/wiki/ResourceProviderAPI
Original comment by richard.eckart
on 8 Mar 2015 at 8:32
I wrote things together here:
https://code.google.com/p/dkpro-tc/wiki/ReusableTCModels
The page is not linked, didn't know where to place it...
Original comment by Tobias.H...@gmail.com
on 9 Mar 2015 at 7:32
Original issue reported on code.google.com by
Tobias.H...@gmail.com
on 7 Mar 2015 at 1:24