EIS-Bonn / Squerall

An implementation of the so-called Semantic Data Lake, using Apache Spark and Presto.
https://eis-bonn.github.io/Squerall/
Apache License 2.0
30 stars 12 forks source link

Null Point Exception running Squerall over BSBM #4

Closed dachafra closed 4 years ago

dachafra commented 4 years ago

We create a simple example to query a set of CSV extracted from BSBM and we are obtaining Null Point Exception in Spark but also in Presto. The resources are attached

mnmami commented 4 years ago

Hi @dachafra ,

Sorry for the delay; on vacation with limited access to my laptop. Please, note that Squerall doesn't as of now support the wildcard in the SELECT clause, could you select some specific predicate variables and see if you bypass that error? Let me know. If you do bypass the error, please open an issue with a request to add support for the wildcard.

dachafra commented 4 years ago

Hi @mnmami , Here are the logs (same error) with the projection of the variables in the SELECT clause. log-presto-v2.txt log-scala-v2.txt

mnmami commented 4 years ago

Hi @dachafra,

Sorry again for the delay. I'm yet to try your files, I will try to do that during this week. But meanwhile, could you please clone the develop branch and try again?

git clone --single-branch --branch develop https://github.com/EIS-Bonn/Squerall.git

I have improved the logging messages there and captured the Null Point Exception with hopefully useful error messages. I hope this helps self-solve the issue, but if it isn't the case, please let me know.

dachafra commented 4 years ago

Hi, we solved the problem but now it seems that Squerall does not support POM with join-conditions. Here are the logs, do you prefer to close this issue and open new one? log_spark.txt log_presto.txt

jatoledo commented 4 years ago

Hi, I'm using git clone --single-branch --branch develop https://github.com/EIS-Bonn/Squerall.git

and I have NullPointerException without join-conditions in the mapping. Query:

PREFIX foaf:   <http://xmlns.com/foaf/0.1/>

SELECT ?x ?name ?mbox
WHERE  {
    ?x foaf:name ?name .
    ?x foaf:mbox_sha1sum ?mbox.
 }

Mapping:

@prefix exp: <http://example.com/ns/>
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
@prefix dcterms: <http://purl.org/dc/terms/>
@prefix schema: <http://schema.org/>
@prefix gr: <http://purl.org/goodrelations/v1#>
@prefix npg: <http://ns.nature.com/terms/>
@prefix foaf: <http://xmlns.com/foaf/spec/> # correct http://xmlns.com/foaf/0.1/
@prefix edm: <http://www.europeana.eu/schemas/edm/>
@prefix rr: <http://www.w3.org/ns/r2rml#>
@prefix rml: <http://semweb.mmlab.be/ns/rml#>
@prefix nosql: <http://purl.org/db/nosql#>
@prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
@prefix dc: <http://purl.org/dc/elements/1.1/>
@prefix rev: <http://purl.org/stuff/rev#>

<#PersonMapping>
        rml:logicalSource [
                rml:source "/home/jtoledo/Desktop/BSBM/bsbmtools-0.2/dataset/45k/person.csv";
                nosql:store nosql:csv
        ];
        rr:subjectMap [
                rr:template "http://example.com/{nr}";
                rr:class foaf:Person
        ];

        rr:predicateObjectMap [
                rr:predicate edm:country;
                rr:objectMap [rml:reference "country"]
        ];

        rr:predicateObjectMap [
                rr:predicate dc:publisher;
                rr:objectMap [rml:reference "publisher"]
        ];

        rr:predicateObjectMap [
                rr:predicate foaf:mbox_sha1sum;
                rr:objectMap [rml:reference "mbox_sha1sum"]
        ];

        rr:predicateObjectMap [
                rr:predicate exp:publishDate;
                rr:objectMap [rml:reference "publishDate"]
        ];

        rr:predicateObjectMap [
                rr:predicate foaf:name;
                rr:objectMap [rml:reference "name"]
        ];
.

data: person.csv

"nr","name","mbox_sha1sum","country","publisher","publishDate"
"1","Ruggiero-Delane","fb3efd92e3c7a8d775a895ba476e11a3e8f3fac","US","1","2008-09-05"
"2","Eyana-Aurelianus","df1cf8e68d49e5b65f1507dbecec6b61e9dc98","JP","1","2008-08-07"
"3","Danijela-Adalbrand","9b9d4b8dcf7ada3c181b4bed1fa3c53d29caf65","US","1","2008-07-21"
"4","Allegra-Walburga","619b2f69a01a7d86c0eca3f5e910c5b559ff3a","RU","1","2008-06-23"
"5","Przemek-Berte","c3b1c82511908f706153319688a7a5599b8ad8c0","ES","1","2008-08-19"
"6","Caryn","d6deee088e99af0f7c65fb7cca9bdfbbe3d7343","CN","1","2008-06-29"
"7","Athalia-Diellza","fac79f4faa4c9a6b957d5f1380835a4dfb50","JP","1","2008-06-30"
"8","Linda-Nada","99f90881e5d24a14f913d8a961fd81c49686fa","JP","1","2008-08-12"
"9","Takiji-Yaphet","46953b16dbd382d824721e3078f8959596b17ab5","JP","1","2008-07-13"

NullPointerException :
Exception.txt

2020-06-22 19:35:01 INFO  Squerall:144 - - filters: {} for star ?x
2020-06-22 19:35:01 INFO  Squerall:187 - Number of filters of this star is: 0
2020-06-22 19:35:01 INFO  Squerall:206 - single...with ParSet schema: null
2020-06-22 19:35:01 INFO  Squerall:210 - QUERY EXECUTION starting...*/
2020-06-22 19:35:01 INFO  Squerall:211 - DataFrames: Map(?x -> null)
2020-06-22 19:35:01 INFO  Squerall:242 -  Single star query
2020-06-22 19:35:01 INFO  Squerall:248 - --> Needed predicates select: Set((?x,<http://xmlns.com/foaf/0.1/name>), (?x,<http://xmlns.com/foaf/0.1/mbox_sha1sum>))
2020-06-22 19:35:01 INFO  Squerall:277 - --> SELECTED column names: List(x_name_foaf, x_mbox_sha1sum_foaf, x)
2020-06-22 19:35:01 INFO  Squerall:302 - |__ Has distinct? false
Exception in thread "main" java.lang.NullPointerException
        at org.squerall.SparkExecutor.project(SparkExecutor.scala:493)
        at org.squerall.SparkExecutor.project(SparkExecutor.scala:17)
        at org.squerall.Run.application(Run.scala:303)
        at org.squerall.Main$.delayedEndpoint$org$squerall$Main$1(Main.scala:22)
        at org.squerall.Main$delayedInit$body.apply(Main.scala:9)
        at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
        at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
mnmami commented 4 years ago

Hi, we solved the problem but now it seems that Squerall does not support POM with join-conditions. Here are the logs, do you prefer to close this issue and open new one? log_spark.txt log_presto.txt

Yes, please open a separate issue for it for easier traceability. While doing so, please elaborate more on "POM with join-conditions". Thanks.

mnmami commented 4 years ago

@dachafra, it would be good if you could write what solved your problem, so I close this issue with something useful to users with a similar problem.

mnmami commented 4 years ago

@jatoledo, please also open a separate issue. I'll try to help there. Thank you.

dachafra commented 4 years ago

The problem of using the wildcard in the SELECT clause is still open. We solved it projecting all the variables.

mnmami commented 4 years ago

I see. Could you open an issue for the support of wildcard?