dbpedia / databus-maven-plugin

Databus Maven Plugin: Aligning Data and Software Lifecycle with Maven
GNU Affero General Public License v3.0
6 stars 10 forks source link

docker installation #97

Closed petacube closed 4 years ago

petacube commented 5 years ago

i am trying to install docker as per documentation on dbpedia site however it fails in the middle with following error: docker run --name databus-client -v $(pwd)/query:/opt/databus-client/query -v $(pwd):/var/repo -e FORMAT="ttl" -e COMPRESSION="gz" dbpedia/databus-client ..... some time later Downloader:

files to download: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/root/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/root/.m2/repository/org/slf4j/slf4j-simple/1.7.25/slf4j-simple-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] QUERY: PREFIX dataid-mt: http://dataid.dbpedia.org/ns/mt# PREFIX dataid: http://dataid.dbpedia.org/ns/core# PREFIX dct: http://purl.org/dc/terms/ PREFIX dcat: http://www.w3.org/ns/dcat# PREFIX dataid-cv: http://dataid.dbpedia.org/ns/cv#

SELECT DISTINCT ?file WHERE { ?dataset dataid:artifact https://databus.dbpedia.org/denis/ontology/dbo-snapshots ; dcat:distribution ?distribution . ?distribution dcat:mediaType dataid-mt:ApplicationNTriples ; dct:hasVersion ?latestVersion ; dcat:downloadURL ?file { SELECT (?version AS ?latestVersion) WHERE { ?dataset dataid:artifact https://databus.dbpedia.org/denis/ontology/dbo-snapshots ; dct:hasVersion ?version } ORDER BY DESC(?version) LIMIT 1 } }

https://raw.githubusercontent.com/dbpedia/ontology-tracker/82dda5d3c3d353292518b6687cdb970d7ebb391b/databus/dbpedia/ontology/dbo-snapshots/dbo-snapshots.nt couldn't query dataidfile


Converter:

input file: /opt/databus-client/tempdir_downloaded_files/raw.githubusercontent.com/dbpedia/ontology-tracker/82dda5d3c3d353292518b6687cdb970d7ebb391b/databus/dbpedia/ontology/dbo-snapshots/dbo-snapshots.nt converted file: /var/repo/NoDataID/raw.githubusercontent.com/dbpedia/ontology-tracker/82dda5d3c3d353292518b6687cdb970d7ebb391b/databus/dbpedia/ontology/dbo-snapshots/dbo-snapshots.ttl.gz

java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala_maven_executions.MainHelper.runMain(MainHelper.java:164) at scala_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26) Caused by: java.lang.IllegalArgumentException: System memory 464519168 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration. at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:217) at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:199) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:330) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:175) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257) at org.apache.spark.SparkContext.(SparkContext.scala:424) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926) at org.dbpedia.databus.Converter$.convertFormat(Converter.scala:54) at org.dbpedia.databus.FileHandler$.convertFile(FileHandler.scala:57) at org.dbpedia.databus.main.Main_DownloadAndConvert$$anonfun$main$1.apply(Main_DownloadAndConvert.scala:58) at org.dbpedia.databus.main.Main_DownloadAndConvert$$anonfun$main$1.apply(Main_DownloadAndConvert.scala:54) at scala.collection.immutable.Stream.foreach(Stream.scala:594) at org.dbpedia.databus.main.Main_DownloadAndConvert$.main(Main_DownloadAndConvert.scala:54) at org.dbpedia.databus.main.Main_DownloadAndConvert.main(Main_DownloadAndConvert.scala) ... 6 more [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 01:21 min [INFO] Finished at: 2019-10-06T15:42:29Z [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.3.1:run (default-cli) on project databus-client: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 240 (Exit value: 240) -> [Help 1]

kurzum commented 5 years ago

The error is caused by a bug in the artifact:

https://databus.dbpedia.org/denis/ontology/dbo-snapshots/2019.09.01-103002

It is an experimental artifact to get Databus to work seamlessly with github.

Let me check these options:

Eisenbahnplatte commented 5 years ago

I tried to reproduce your issue several times now, but i never got an exception.

mkdir repo
cd repo

echo "PREFIX dataid: <http://dataid.dbpedia.org/ns/core#>
PREFIX dataid-cv: <http://dataid.dbpedia.org/ns/cv#>
PREFIX dataid-mt: <http://dataid.dbpedia.org/ns/mt#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX dcat:  <http://www.w3.org/ns/dcat#>

# Get latest ontology NTriples file 
SELECT DISTINCT ?file WHERE {
 ?dataset dataid:artifact <https://databus.dbpedia.org/denis/ontology/dbo-snapshots> .
?dataset dcat:distribution ?distribution .
        ?distribution dcat:mediaType dataid-mt:ApplicationNTriples . 
?distribution dct:hasVersion ?latestVersion .  
?distribution dcat:downloadURL ?file .

{
SELECT (?version as ?latestVersion) WHERE { 
?dataset dataid:artifact <https://databus.dbpedia.org/denis/ontology/dbo-snapshots> . 
?dataset dct:hasVersion ?version . 
} ORDER BY DESC (?version) LIMIT 1 
} 

}" > query

docker run --name databus-client \
    -v $(pwd)/query:/opt/databus-client/query \
    -v $(pwd):/var/repo \
    -e FORMAT="ttl" \
    -e COMPRESSION="bz2" \
    dbpedia/databus-client

docker rm databus-client 
petacube commented 5 years ago

How much memory you have on test machine? I have 16gb

On Tue, Oct 8, 2019, 08:24 Eisenbahnplatte notifications@github.com wrote:

I tried to reproduce your issue several times now, but i never got an exception.

mkdir repo cd repo

echo "PREFIX dataid: http://dataid.dbpedia.org/ns/core# PREFIX dataid-cv: http://dataid.dbpedia.org/ns/cv# PREFIX dataid-mt: http://dataid.dbpedia.org/ns/mt# PREFIX dct: http://purl.org/dc/terms/ PREFIX dcat: http://www.w3.org/ns/dcat#

Get latest ontology NTriples file

SELECT DISTINCT ?file WHERE { ?dataset dataid:artifact https://databus.dbpedia.org/denis/ontology/dbo-snapshots . ?dataset dcat:distribution ?distribution . ?distribution dcat:mediaType dataid-mt:ApplicationNTriples . ?distribution dct:hasVersion ?latestVersion . ?distribution dcat:downloadURL ?file .

{ SELECT (?version as ?latestVersion) WHERE { ?dataset dataid:artifact https://databus.dbpedia.org/denis/ontology/dbo-snapshots . ?dataset dct:hasVersion ?version . } ORDER BY DESC (?version) LIMIT 1 }

}" > query

docker run --name databus-client \ -v $(pwd)/query:/opt/databus-client/query \ -v $(pwd):/var/repo \ -e FORMAT="ttl" \ -e COMPRESSION="bz2" \ dbpedia/databus-client

docker rm databus-client

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dbpedia/databus-maven-plugin/issues/97?email_source=notifications&email_token=AHHF6KRZNGGNHGADP3KI27LQNR3ZNA5CNFSM4I54Q4LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAT64DA#issuecomment-539487756, or mute the thread https://github.com/notifications/unsubscribe-auth/AHHF6KREFQ4DYFJLCGNBZLLQNR3ZNANCNFSM4I54Q4LA .

kurzum commented 5 years ago

also worked here:

 find . -name "*.bz2"
./NoDataID/raw.githubusercontent.com/dbpedia/ontology-tracker/82dda5d3c3d353292518b6687cdb970d7ebb391b/databus/dbpedia/ontology/dbo-snapshots/dbo-snapshots.ttl.bz2

I think the memory is not the problem. it is a very small file.

@Eisenbahnplatte my docker updated from dbpedia/databus-client . maybe that is the issue

kurzum commented 5 years ago

@petacube we can not reproduce your error. Did you get it working? Can we close?