Closed purohitsumit closed 4 years ago
Hi @purohitsumit ,
many thanks for using SANSA.
Could you print just a subset of triples without querying it? e.g.
val filepath = "./data/xxxxx.ttl"
val triples = sparkSession.rdf(lang)(filepath)
triples.take(5).foreach(println(_))
and let me know if the parsing isn't an issue?
And also, are you specifying the lang to be turtle
syntax
val lang = Lang.TURTLE
as your data are using .ttl
extension?
In case you are using a public dataset where we can also run that SPARQL query, would be great for debugging/troubleshooting purpose.
Best regards,
Hi @GezimSejdiu
I do specify the language using val lang = Lang.TURTLE
After some debugging, i observed that the parser has issue with e-notation data ex:
34e-02
I took a sample turtle file from select query and added a dummy triple as :
` @prefix foaf: http://xmlns.com/foaf/0.1/ .
:a foaf:name "Alice" ; foaf:knows :b ; foaf:age 34e-02 ; foaf:knows _:c .
_:b foaf:name "Bob" .
:c foaf:name "Clare" . :c foaf:nick "CT" .
`
I get the exception as shown above.
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) at net.sansa_stack.query.spark.sparqlify.QueryExecutionSpark$.createQueryExecution(QueryExecutionSpark.scala:26) at net.sansa_stack.query.spark.query.package$SparqlifyAsDefault.sparql(package.scala:40)
I think It is related to #34
Thanks
Yes, this is the same as #34 - ('double precision' in the generated sql instead of just 'double') Fixed.
Here is my example code `val sc = sparkSession.sparkContext val sqlc = sparkSession.sqlContext
//Query import net.sansastack.query.spark.query. val sparqlQuery = "SELECT * WHERE {?s ?p ?o} LIMIT 10" val result = triples.sparql(sparqlQuery) result.rdd.foreach(println) `
I get following error
`CAST TO string CAST TO string CAST TO double precision CAST TO string CAST TO string Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'FROM' expecting(line 2, pos 0)
== SQL == SELECT
a_45
.C_3
C_3
,a_45
.C_4
C_4
,a_45
.C_5
C_5
,a_45
.C_11
C_11
,a_45
.C_6
C_6
,a_45
.C_10
C_10
,a_45
.C_7
C_7
,a_45
.C_8
C_8
,a_45
.C_9
C_9
,a_45
.C_14
C_14
,a_45
.C_13
C_13
,a_45
.C_12
C_12
FROM ^^^ ( SELECTa_1
.s
C_14
, CAST(NULL AS string)C_13
, CAST(NULL AS bigint)C_12
, CAST(NULL AS string)C_11
,a_1
.o
C_10
, CAST('https://tac.nist.gov/tracks/SM-KBP/2019/ontologies/InterchangeOntology#justifiedBy' AS string)C_3
, CAST(NULL AS string)C_5
, CAST(NULL AS string)C_4
, CAST(NULL AS string)C_7
, CAST(NULL AS string)C_6
, CAST(NULL AS double precision)C_9
, CAST(NULL AS string)C_8
, CAST('urn:x-arq:DefaultGraph' AS string) `C_15``Am i missing something here ? I am using 0.6.1-SNAPSHOT version of "sense-rdf" and "sense-query"
Thanks