dkpro / dkpro-tc

UIMA-based text classification framework built on top of DKPro Core and DKPro Lab.
https://dkpro.github.io/dkpro-tc/
Other
34 stars 19 forks source link

Integrate VowpalWabbit #520

Closed Horsmann closed 6 years ago

Horsmann commented 6 years ago

powerful logistic regression

https://github.com/VowpalWabbit/vowpal_wabbit

Comes as binary as it seems

Tutorial: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Tutorial http://www.philippeadjiman.com/blog/2018/04/03/deep-dive-into-logistic-regression-part-3/

Again, the question is how and if the TC-logic can be reasonably used with this binary.

https://github.com/VowpalWabbit/vowpal_wabbit/wiki/One-Against-All-%28oaa%29-multi-class-example

Horsmann commented 6 years ago

@reckart I played around with the c++/c binary of vowpalWabbit. I don't see an immediate way how to get this compiled statically.

The setups is a bit untypical what I need of the files are two binaries

libtools
vw
.libs (folder)
   - lib1
   - lib2
   - lib3
   - lib 4

So far I understood, the toplevel vw operates the libtools binary that loads the libaries in the subfolder and is the intended command line interface to the binary. vw for itself contains no machine learning logic its just an operating interface as it seems.

Is it possible to deploy such a setup with the RuntimeProvider? I don't think I will get this wrapped into a single binary version. It would be nice if I could just deploy this handful of files. Is this possible/supported?

reckart commented 6 years ago

The RuntimeProvider is designed to support deploying multiple files - worth a try :)

Horsmann commented 6 years ago

@reckart How would I retrieve a working file in this case? calling runtimeProvider.getFile("vw") did not have the desired effect.

I have all files packed in the .jar but apparently the runtime provider is not able to unpack them. Any suggestions?

reckart commented 6 years ago

@Horsmann The runtime provide requires you to provide a manifest file which contains a list of the files to deploy. Cf. the manifest file added to the hunpos binaries JAR that is created by the build.xml in the hunpos module:

    <propertyfile
        file="target/model-staging/de/tudarmstadt/ukp/dkpro/core/hunpos/bin/linux-x86_32/manifest.properties">
      <entry  key="hunpos-tag" value="executable"/>
      <entry  key="hunpos-train" value="executable"/>
    </propertyfile>
Horsmann commented 6 years ago

hm,

my ant script looks like this. Its a bit messy I suppose but the content actually reaches the .jar file. When instantiating the RuntimeProvider, the folder that is created, is empty.

<untar 
                src="target/download/vowpalwabbit-8.6.1.osx-x86_64.tar"
                dest="target/download/osx-x86_64">
            <patternset>
                <include name="*/vw" />
                <include name="*/.libs/*" />
            </patternset>
        </untar>

        <copy file="target/download/osx-x86_64/vowpalwabbit-8.6.1.osx-x86_64/vw" tofile="target/model-staging/org/dkpro/tc/ml/vowpalwabbit/osx-x86_64/vw"/>
        <copy file="target/download/osx-x86_64/vowpalwabbit-8.6.1.osx-x86_64/.libs/liballreduce.0.dylib" tofile="target/model-staging/org/dkpro/tc/ml/vowpalwabbit/osx-x86_64/.libs/liballreduce.0.dylib"/>
        <copy file="target/download/osx-x86_64/vowpalwabbit-8.6.1.osx-x86_64/.libs/liballreduce.a" tofile="target/model-staging/org/dkpro/tc/ml/vowpalwabbit/osx-x86_64/.libs/liballreduce.a"/>
        <copy file="target/download/osx-x86_64/vowpalwabbit-8.6.1.osx-x86_64/.libs/libvw_c_wrapper.0.dylib" tofile="target/model-staging/org/dkpro/tc/ml/vowpalwabbit/osx-x86_64/.libs/libvw_c_wrapper.0.dylib"/>
        <copy file="target/download/osx-x86_64/vowpalwabbit-8.6.1.osx-x86_64/.libs/libvw_c_wrapper.a" tofile="target/model-staging/org/dkpro/tc/ml/vowpalwabbit/osx-x86_64/.libs/libvw_c_wrapper.a"/>
        <copy file="target/download/osx-x86_64/vowpalwabbit-8.6.1.osx-x86_64/.libs/libvw.0.dylib" tofile="target/model-staging/org/dkpro/tc/ml/vowpalwabbit/osx-x86_64/.libs/libvw.0.dylib"/>
        <copy file="target/download/osx-x86_64/vowpalwabbit-8.6.1.osx-x86_64/.libs/libvw.a" tofile="target/model-staging/org/dkpro/tc/ml/vowpalwabbit/osx-x86_64/.libs/libvw.a"/>                               
        <copy file="target/download/osx-x86_64/vowpalwabbit-8.6.1.osx-x86_64/.libs/vw" tofile="target/model-staging/org/dkpro/tc/ml/vowpalwabbit/osx-x86_64/.libs/vw"/>                             

        <echo file="target/model-staging/org/dkpro/tc/ml/vowpalwabbit/osx-x86_64/README">
            VowpalWabbit 8.6.1
        </echo>

        <propertyfile
            file="target/model-staging/org/dkpro/tc/ml/vowpalwabbit/osx-x86_64/manifest.properties">
            <entry  key="vw" value="executable"/>
            <entry  key=".libs/vw" value="executable"/>
        </propertyfile>

I essentially need a hidden folder .libs with the library files in it. The "vw" top level is actually a bash script in case this matters.

reckart commented 6 years ago

Set a break-point in de.tudarmstadt.ukp.dkpro.core.api.resources.RuntimeProvider.install() and step it through.

Horsmann commented 6 years ago

@reckart UKP's Jenkins is failing with a missing license error https://zoidberg.ukp.informatik.tu-darmstadt.de/jenkins/job/DKPro%20Text%20Classification%20Framework%20(GitHub)/984/console

On our Jenkins the rat check is passing without any errors, i.e. all licenses in place. I cannot see the folder structure of the project on the UKP Jenkins and I cannot reproduce the problem locally, could you give me permission on UKP Jenkins to see the workspace folders. I would like to take a look into the rat.txt to see what the problem is.

Horsmann commented 6 years ago

btw. The Windows build does not fail with this rat check issue (it fails later, elsewhere thats ok imo). Anyway, I look into the rat.txt of the Linux build would be good.

reckart commented 6 years ago

You can configure rat to list the names of the problematic files in the console:

http://creadur.apache.org/rat/apache-rat-plugin/check-mojo.html#consoleOutput

reckart commented 6 years ago

I have added a "cleanup" step to the build which removes all unversioned files before the build. That should take care of coredump files which are usually the problem if rat works locally but not on Jenkins.

Horsmann commented 6 years ago

Thx.

In which repository can I upload the vowpalwabbit .jar with the binaries? The public-snapshots does not seem to exist anymore or it is not visible to me anymore?

reckart commented 6 years ago

I believe you should be able to publish snapshots to https://oss.sonatype.org/content/repositories/snapshots/ using your Sonatype OSS account. But we (@Rentier and me) are also looking in giving you access to zoidberg.

Horsmann commented 6 years ago

@reckart The DKPro TC pom.xml contains our artifactory but it seems to be ignored when trying to retrieve the vowpalwabbit .jar. Any ideas why? I find no traces in the log that our artifactory was asked for the jar. UKP Jenkins checks the UKP reps and central where the .jar is not available.

reckart commented 6 years ago

zoidberg should take any repos declared in the POM into account. I just checked: it uses a special settings.xml file, but there is no catch-all-proxy defined in it.

Horsmann commented 6 years ago

@reckart Could you upload the .jar into the UKP rep? The installation script is provided at the usual location.