AKSW / LSQ

Linked SPARQL Queries (LSQ): Framework for RDFizing triple store (web) logs and performing SPARQL query extraction, analysis and benchmarking in order to produce datasets of Linked SPARQL Queries
http://lsq.aksw.org
Apache License 2.0
25 stars 11 forks source link

How to run the jar bundle #22

Open marmhm opened 3 years ago

marmhm commented 3 years ago

Hi, we are trying to run LSQ with the java build

Unfortunately, the documentation does not reflect the actual code, it says:

mvn -P bundle clean install

java -jar lsq-bundle/target/

But there is no lsq-bundle folder in the repository, and unfortunately, there is no further documentation on how to run and use the .jar file (all documentation is for the debian installed package)

The debian package documentation mention using lsq-cli so we assume there is confusion in the documentation and we should use the lsq-cli subfolder to run LSQ

So we tried running the lsq-cli jar file:

java -jar .\lsq-cli\target\lsq-cli-2.0.0-SNAPSHOT-jar-with-dependencies.jar

But it does not seem to have been packaged properly:

no main manifest attribute, in .\lsq-cli\target\lsq-cli-2.0.0-SNAPSHOT-jar-with-dependencies.jar

It is quite a common error in Java packages building, you need to define the main function used when the package will be run

For example here: https://github.com/AKSW/LSQ/blob/develop/lsq-cli/pom.xml#L98

In the build configuration we might need to add something like:

                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>org.aksw.simba.lsq.cli.main.MainCliLsq</mainClass>
                        </manifest>
                    </archive>
                </configuration>

Any idea how we can fix the Jar build and run LSQ with Java?

Note: running on Windows 10 with Java 15

Aklakan commented 3 years ago

The documentation certainly needs an overhaul; and yes, the manifest should be part of the jar bundle!

Currently lsq uses a system call to /usr/bin/sort for sorting/merging large amounts of rdf graphs. This prevents use with windows - however, this limitation can be lifted by using apache spark's sort operator (via RDDs). LSQ has already an embedded spark module, but the code still needs to be migrated.

Aklakan commented 3 years ago

Progress update: lsq spark rdfize should now work on windows as it uses spark's sort operator - instead of relying on /usr/bin/sort. As most other commands depend on the rdfize code, they should work as well but it all needs more testing. lsq rdfize only creates an rdf version of the log for subsequent processing; lsq analyze is the command that performs the static analysis / enrichment.

Note that lsq rdfize will probably become lsq rx rdfize in order to discriminate the engines (rx=rxjava; lsq spark builds on original lsq code which uses rxjava). lsq rx may be faster for smaller datasets and it can read from stdin which lsq spark can't.

Aklakan commented 3 years ago

The main class manifest was added and the documentation at lsq.aksw.org/ updated

Aklakan commented 3 years ago

Please confirm whether it works now