RMLio / RML-Mapper

Generate High Quality Linked Data from multiple originally (semi-)structured data (legacy)
http://RML.io
52 stars 20 forks source link

Build seems to have an issue with character-encoding #40

Open amalic opened 6 years ago

amalic commented 6 years ago

Tried with Open JDK 1.7 on Ubuntu Linux and one of the JUnit tests fails.

[Fatal Error] :2:59: An invalid XML character (Unicode: 0x7) was found in the element content of the document.
[Fatal Error] :2:65: Invalid byte 2 of 2-byte UTF-8 sequence.
[Fatal Error] :2:66: Invalid byte 2 of 3-byte UTF-8 sequence.
[Fatal Error] :2:70: Invalid byte 2 of 3-byte UTF-8 sequence.
[Fatal Error] :2:61: Invalid byte 2 of 2-byte UTF-8 sequence.
[Fatal Error] :2:61: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:68: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:70: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:58: Invalid byte 2 of 3-byte UTF-8 sequence.
[Fatal Error] :2:56: An invalid XML character (Unicode: 0x1f) was found in the element content of the document.
[Fatal Error] :2:61: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:58: Invalid byte 2 of 3-byte UTF-8 sequence.
[Fatal Error] :2:90: Invalid byte 2 of 3-byte UTF-8 sequence.
[Fatal Error] :2:93: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:62: An invalid XML character (Unicode: 0x1) was found in the element content of the document.
[Fatal Error] :2:64: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:62: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:63: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:63: Invalid byte 2 of 4-byte UTF-8 sequence.
[Fatal Error] :2:77: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:77: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:77: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:59: Invalid byte 2 of 3-byte UTF-8 sequence.
[Fatal Error] :2:57: Invalid byte 2 of 3-byte UTF-8 sequence.
[Fatal Error] :2:60: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:59: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:59: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:59: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:59: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:59: An invalid XML character (Unicode: 0x13) was found in the element content of the document.
[Fatal Error] :2:59: An invalid XML character (Unicode: 0x13) was found in the element content of the document.

Steps to reproduce (using Docker)

  1. check out code including subprojects (git clone --recursive <repo>)
  2. create a file called "Dockerfile" with following content inside RML-Mapper directory
    FROM maven:3-jdk-7
    WORKDIR /tmp/rml-mapper
    COPY . .
  3. run docker build -t rml-mapper .
  4. run docker run --rm -it rml-mapper bash
  5. inside the docker container run mvn clean install
  6. build will fail

When i run maven install with -DskipTests=true it results in only having "RML-Mapper-3.0.2-shaded.pom" file in the target sub-directory.

canarvaeza commented 6 years ago

Same for me

hujiajia0401 commented 6 years ago

i get the same issue while process some xml file which hase some chinese charactor. do you guys find the solution?

pheyvaer commented 6 years ago

We are currently working on a new version of the mapper. The CLI parameters mostly remain the same. Could you please use this one and see if the problem remains?

SemanticBeeng commented 6 years ago

@pheyvaer - tried rml-mapper (once) and mvn install gives

05:20:03.810 [main] DEBUG org.eclipse.rdf4j.rio.LanguageHandlerRegistry - Registered service class org.eclipse.rdf4j.rio.languages.BCP47LanguageHandler
05:20:03.821 [main] DEBUG org.eclipse.rdf4j.rio.RDFParserRegistry - Registered service class org.eclipse.rdf4j.rio.binary.BinaryRDFParserFactory
05:20:03.822 [main] DEBUG org.eclipse.rdf4j.rio.RDFParserRegistry - Registered service class org.eclipse.rdf4j.rio.jsonld.JSONLDParserFactory
05:20:03.822 [main] DEBUG org.eclipse.rdf4j.rio.RDFParserRegistry - Registered service class org.eclipse.rdf4j.rio.n3.N3ParserFactory
05:20:03.823 [main] DEBUG org.eclipse.rdf4j.rio.RDFParserRegistry - Registered service class org.eclipse.rdf4j.rio.nquads.NQuadsParserFactory
05:20:03.823 [main] DEBUG org.eclipse.rdf4j.rio.RDFParserRegistry - Registered service class org.eclipse.rdf4j.rio.ntriples.NTriplesParserFactory
05:20:03.824 [main] DEBUG org.eclipse.rdf4j.rio.RDFParserRegistry - Registered service class org.eclipse.rdf4j.rio.rdfjson.RDFJSONParserFactory
05:20:03.824 [main] DEBUG org.eclipse.rdf4j.rio.RDFParserRegistry - Registered service class org.eclipse.rdf4j.rio.rdfxml.RDFXMLParserFactory
05:20:03.825 [main] DEBUG org.eclipse.rdf4j.rio.RDFParserRegistry - Registered service class org.eclipse.rdf4j.rio.trig.TriGParserFactory
05:20:03.825 [main] DEBUG org.eclipse.rdf4j.rio.RDFParserRegistry - Registered service class org.eclipse.rdf4j.rio.trix.TriXParserFactory
05:20:03.826 [main] DEBUG org.eclipse.rdf4j.rio.RDFParserRegistry - Registered service class org.eclipse.rdf4j.rio.turtle.TurtleParserFactory
05:20:04.089 [main] WARN be.ugent.rml.Utils - Not all values for a template where found. More specific, the variable WrongReference did not provide any results.
java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at be.ugent.rml.functions.FunctionModel.execute(FunctionModel.java:37)
    at be.ugent.rml.functions.Function.execute(Function.java:33)
    at be.ugent.rml.termgenerator.LiteralGenerator.generate(LiteralGenerator.java:37)
    at be.ugent.rml.Executor.generatePredicateObjectGraphs(Executor.java:120)
    at be.ugent.rml.Executor.execute(Executor.java:76)
    at be.ugent.rml.Executor.execute(Executor.java:93)
    at be.ugent.rml.TestCore.doMapping(TestCore.java:59)
    at be.ugent.rml.TestFunctionCore.doPreloadMapping(TestFunctionCore.java:26)
    at be.ugent.rml.Custom_RML_FnO_Mapper_CSV_Test.evaluate_0007_CSV(Custom_RML_FnO_Mapper_CSV_Test.java:49)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
    at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
    at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
    at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
Caused by: java.lang.NullPointerException
    at be.ugent.rml.functions.lib.GrelTestProcessor.toUppercase(GrelTestProcessor.java:8)
    ... 36 more
05:20:04.120 [main] INFO be.ugent.rml.functions.FunctionUtils - Found class on path /development/projects/02_arch/rmlmapper-java/target/classes/GrelFunctions.jar
Tests run: 19, Failures: 0, Errors: 0, Skipped: 7, Time elapsed: 0.562 sec - in be.ugent.rml.Custom_RML_FnO_Mapper_CSV_Test
Running be.ugent.rml.Mapper_XML_Test
05:20:04.359 [main] WARN be.ugent.rml.Utils - Not all values for a template where found. More specific, the variable IDs did not provide any results.
05:20:04.514 [main] WARN be.ugent.rml.Utils - Not all values for a template where found. More specific, the variable Name did not provide any results.
05:20:04.580 [main] WARN be.ugent.rml.Utils - Not all values for a template where found. More specific, the variable Sport did not provide any results.
05:20:04.593 [main] WARN be.ugent.rml.Utils - Not all values for a template where found. More specific, the variable Sport did not provide any results.
Tests run: 29, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.536 sec - in be.ugent.rml.Mapper_XML_Test
Running be.ugent.rml.Mapper_RDBs_Test
05:20:04.742 [main] INFO ch.vorburger.mariadb4j.Util - Created directory: /tmp/MariaDB4j/base
05:20:04.742 [main] INFO ch.vorburger.mariadb4j.Util - Created directory: /tmp/MariaDB4j/base/libs
05:20:04.742 [main] INFO ch.vorburger.mariadb4j.Util - Created directory: /tmp/MariaDB4j/data/38540
05:20:04.776 [main] DEBUG org.springframework.core.io.support.PathMatchingResourcePatternResolver - Resolved classpath location [ch/vorburger/mariadb4j/mariadb-10.1.13/linux/] to resources [URL [jar:file:/home/nickdsc/.m2/repository/ch/vorburger/mariaDB4j/mariaDB4j-db-linux64/10.1.13/mariaDB4j-db-linux64-10.1.13.jar!/ch/vorburger/mariadb4j/mariadb-10.1.13/linux/]]
05:20:04.777 [main] DEBUG org.springframework.core.io.support.PathMatchingResourcePatternResolver - Looking for matching resources in jar file [file:/home/nickdsc/.m2/repository/ch/vorburger/mariaDB4j/mariaDB4j-db-linux64/10.1.13/mariaDB4j-db-linux64-10.1.13.jar]
........
05:20:13.512 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:13 140470513883008 [Note] /tmp/MariaDB4j/base/bin/mysqld (mysqld 10.1.13-MariaDB) starting as process 11269 ...
05:20:13.632 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:13 140470513883008 [Note] InnoDB: Using mutexes to ref count buffer pool pages
05:20:13.632 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:13 140470513883008 [Note] InnoDB: The InnoDB memory heap is disabled
05:20:13.632 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:13 140470513883008 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
05:20:13.632 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:13 140470513883008 [Note] InnoDB: Memory barrier is not used
05:20:13.632 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:13 140470513883008 [Note] InnoDB: Compressed tables use zlib 1.2.3
05:20:13.633 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:13 140470513883008 [Note] InnoDB: Using Linux native AIO
05:20:13.633 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:13 140470513883008 [Note] InnoDB: Using SSE crc32 instructions
05:20:13.633 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:13 140470513883008 [Note] InnoDB: Initializing buffer pool, size = 128.0M
05:20:13.637 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:13 140470513883008 [Note] InnoDB: Completed initialization of buffer pool
05:20:13.642 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:13 140470513883008 [Note] InnoDB: Highest supported file format is Barracuda.
05:20:13.652 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:13 140470513883008 [Note] InnoDB: 128 rollback segment(s) are active.
05:20:13.652 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:13 140470513883008 [Note] InnoDB: Waiting for purge to start
05:20:13.702 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:13 140470513883008 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.28-76.1 started; log sequence number 1616799
05:20:13.707 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:13 140469828814592 [Note] InnoDB: Dumping buffer pool(s) not yet started
05:20:15.879 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: OK
05:20:15.880 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: Creating OpenGIS required SP-s...
05:20:15.993 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:15 140265318651776 [Note] /tmp/MariaDB4j/base/bin/mysqld (mysqld 10.1.13-MariaDB) starting as process 11298 ...
05:20:16.112 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:16 140265318651776 [Note] InnoDB: Using mutexes to ref count buffer pool pages
05:20:16.112 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:16 140265318651776 [Note] InnoDB: The InnoDB memory heap is disabled
05:20:16.112 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:16 140265318651776 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
05:20:16.112 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-09-02  5:20:16 140265318651776 [Note] InnoDB: Memory barrier is not used
05:20:16.113 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysql_install_db: 2018-
...
05:20:18.703 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysqld: 2018-09-02  5:20:18 139714510616448 [Note] InnoDB: Highest supported file format is Barracuda.
05:20:18.716 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysqld: 2018-09-02  5:20:18 139714510616448 [Note] InnoDB: 128 rollback segment(s) are active.
05:20:18.717 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysqld: 2018-09-02  5:20:18 139714510616448 [Note] InnoDB: Waiting for purge to start
05:20:18.768 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysqld: 2018-09-02  5:20:18 139714510616448 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.28-76.1 started; log sequence number 1616819
05:20:18.780 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysqld: 2018-09-02  5:20:18 139714510616448 [Note] Plugin 'FEEDBACK' is disabled.
05:20:18.780 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysqld: 2018-09-02  5:20:18 139713818101504 [Note] InnoDB: Dumping buffer pool(s) not yet started
05:20:18.790 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysqld: 2018-09-02  5:20:18 139714510616448 [Note] Server socket created on IP: '::'.
05:20:18.793 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysqld: 2018-09-02  5:20:18 139714510616448 [Note] /tmp/MariaDB4j/base/bin/mysqld: ready for connections.
05:20:18.794 [Exec Stream Pumper] INFO ch.vorburger.exec.ManagedProcess - mysqld: Version: '10.1.13-MariaDB'  socket: '/tmp/MariaDB4j.38540.sock'  port: 38540  MariaDB Server
05:20:18.812 [main] INFO ch.vorburger.mariadb4j.DB - Database startup complete.

Please advise how this new rml-mapper project relates to this entire RML-Mapper ecosystem of projects. Will it replace it completely, partially? Motivation for this move, etc.

Lastly, btw, was looking to test this use case https://stackoverflow.com/questions/49365220/how-to-convert-json-array-to-rdf-in-java but ran into same issue.

Thank you.

pheyvaer commented 6 years ago

This is probably due to the fact that you are testing it locally without following the instructions that explain how to run the tests locally. Can you check that first (see README.md) and if you still have problems make an issue on the correct repo?