HeidelTime / heideltime

A multilingual, cross-domain temporal tagger developed at the Database Systems Research Group at Heidelberg University.
GNU General Public License v3.0
343 stars 67 forks source link

HeidelTime on Maven Central #33

Closed alexeygrigorev closed 8 years ago

alexeygrigorev commented 8 years ago

It would be nice to be able to get the library from Maven Central. Other users will also appreciate it (see e.g. this questions)

Hronom commented 8 years ago

+1

jzell commented 8 years ago

Hey, and thanks for your interest!

Maven is very nice and we'd very much like to make that possible, but we have one dependency in particular that is not available via their repositories; JVnTextPro.

From what I read, MC strongly recommends against publishing artifacts that don't have resolvable dependencies, as it "creates havok for downstream users". JVnTextPro is a library required for processing Vietnamese documents, unfortunately our own Maven (sans Central) support is a bit hacked together using a local repository that provides a custom JVnTextPro pom.xml created by us.

Do you guys see a way around this? I'm not terribly experienced with publishing to MavenCentral.

Thanks.

alexeygrigorev commented 8 years ago

Maybe you can publish JVnTextPro along with HeidelTime? If their license permits that.

Actually publishing to Central is not that difficult. The think that the easiest way of doing it is via Sonatype's Nexus, and here's an instruction http://central.sonatype.org/pages/ossrh-guide.html and http://central.sonatype.org/pages/apache-maven.html.

I recently managed to publish a small library there, so I can assist if needed.

jzell commented 8 years ago

Maybe you can publish JVnTextPro along with HeidelTime? If their license permits that.

I seem to remember that they licensed their work under GPL 2.0, but I can't find any mentioning of that right now.

Unfortunately, the story doesn't end there; it seems that JVnTextPro itself has an unpublished (MC) dependency as well; lbfgs is not the one (or compatible with the one) you can find on MC right now. In some sense, we'd basically be passing down the problem if we contacted the JVnTextPro people and ask them to publish to MC.

If we can find confirmation that JVnTextPro does use a GPL-compatible license, we could theoretically include their code in our future releases and package their binaries into a prospective MC jar. I'll talk to @JannikStroetgen about whether that's a desirable course of action.

Thanks again for the suggestion.

pgillet commented 8 years ago

A fallback is to maintain a local repository within your Maven project to store JvnTextPro and Heideltime jars:

  1. Follow the steps in https://github.com/HeidelTime/heideltime/wiki/Maven-Support
  2. From JvnTextPro's root folder: mvn deploy:deploy-file -Durl=file:///path/to/your-project/repo/ -Dfile=target/jvntextpro-2.0.jar -DgroupId=jvntextpro -DartifactId=jvntextpro -Dpackaging=jar -Dversion=2.0 -DpomFile=pom.xml
  3. From the HeidelTime's kit folder mvn deploy:deploy-file -Durl=file:///path/to/your-project/repo/ -Dfile=target/de.unihd.dbs.heideltime.standalone.jar -DgroupId=de.unihd.dbs -DartifactId=heideltime -Dpackaging=jar -Dversion=2.0.1 -DpomFile=pom.xml
  4. Add the local repo in your pom

    <repositories>
    <repository>
       <id>your.project.local</id>
       <name>aname</name>
       <url>file:${project.basedir}/repo</url>
    </repository>

  5. Declare the dependency to Heideltime

    <dependencies>
    ...
    <!-- HeidelTime -->
    <dependency>
          <groupId>de.unihd.dbs</groupId>
          <artifactId>heideltime</artifactId>
          <version>2.0.1</version>
    </dependency>

    ...

jzell commented 8 years ago

Thank you very much for the detailed tutorial, Pascal. :+1:

However, this solution from what I understand would just shift the problem about including JVnTextPro into a binary form.

In the meantime, I've been in contact with the JVnTextPro developers and they have graciously agreed to let us include the JVnTextPro code base into HeidelTime. So unless I run into any problems building or publishing HeidelTime to MC, you can expect my next post to this issue to announce a release to MC.

pgillet commented 8 years ago

Great! It would be so much simpler for Maven developers, and they are numerous! ;)

jzell commented 8 years ago

Alright, this should work...

Does this work for your setups? Any mistakes? Any kind of comment is highly appreciated.

alexeygrigorev commented 8 years ago

Great! thanks for doing it! It now will become much easier to use heideltime

I wonder why you marked the dependencies as provided? Probably it would be better if they had the default scope, so users would not have to worry about declaring these dependencies themselves

alexeygrigorev commented 8 years ago

Now you probably also could mention that it's available on the maven central in the readme :)

jzell commented 8 years ago

I wonder why you marked the dependencies as provided? Probably it would be better if they had the default scope, so users would not have to worry about declaring these dependencies themselves

As was pointed out to me here, forcing people to use specific versions of the dependencies may clash with existing setups that use other versions of these dependencies if the dependency scopes are set to compile (default). I'd guess this is particularly true of uima-core and the stanford-corenlp(?) in common use cases.

So with the uima-core library possibly clashing with existing setups, lbfgs and args4j only necessary for Vietnamese processing and stanford-corenlp only necessary for using our UIMA wrapper for their preprocessing, it seemed to make sense from the way I understood the scopes. Now that I think about it, for args4j and lbfgs it might be okay to use the default scope...

Does that make sense at all? Maven doesn't seem terribly flexible to me when it comes to bringing several different use cases under one roof... But I'll admit I'm anything but well-versed, so please let me know if you see room for improvement!

alexeygrigorev commented 8 years ago

Thanks for clarification, it makes sense to me now.