bytedeco / javacpp

The missing bridge between Java and native C++
Other
4.48k stars 581 forks source link

NullPointerException in Parser and poor Unicode support #70

Closed gwenn closed 3 years ago

gwenn commented 8 years ago

With JavaCPP 1.1:

import org.bytedeco.javacpp.*;
import org.bytedeco.javacpp.annotation.*;

@Properties(target="org.sqlite.SQLite", value={
@Platform(include="sqlite3.h")
})
public class SQLite {
}
javac -cp javacpp.jar SQLite.java
java -jar javacpp.jar SQLite
Targeting org/sqlite/SQLite.java
Parsing sqlite3.h
Exception in thread "main" java.lang.NullPointerException
    at org.bytedeco.javacpp.tools.Parser.declarator(Parser.java:913)
    at org.bytedeco.javacpp.tools.Parser.function(Parser.java:1245)
    at org.bytedeco.javacpp.tools.Parser.declarations(Parser.java:2347)
    at org.bytedeco.javacpp.tools.Parser.group(Parser.java:1985)
    at org.bytedeco.javacpp.tools.Parser.declarations(Parser.java:2346)
    at org.bytedeco.javacpp.tools.Parser.extern(Parser.java:2272)
    at org.bytedeco.javacpp.tools.Parser.declarations(Parser.java:2345)
    at org.bytedeco.javacpp.tools.Parser.parse(Parser.java:2421)
    at org.bytedeco.javacpp.tools.Parser.parse(Parser.java:2517)
    at org.bytedeco.javacpp.tools.Builder.parse(Builder.java:67)
    at org.bytedeco.javacpp.tools.Builder.build(Builder.java:624)
    at org.bytedeco.javacpp.tools.Builder.main(Builder.java:773)
saudet commented 8 years ago

Try with the latest on the master branch. There's been a lot of fixes since 1.1.

gwenn commented 8 years ago

Same error with JavaCPP version 1.2-SNAPSHOT

Targeting org/sqlite/SQLite.java
Parsing sqlite3.h
Exception in thread "main" java.lang.NullPointerException
    at org.bytedeco.javacpp.tools.Parser.declarator(Parser.java:1005)
    at org.bytedeco.javacpp.tools.Parser.function(Parser.java:1441)
    at org.bytedeco.javacpp.tools.Parser.declarations(Parser.java:2598)
    at org.bytedeco.javacpp.tools.Parser.group(Parser.java:2198)
    at org.bytedeco.javacpp.tools.Parser.declarations(Parser.java:2597)
    at org.bytedeco.javacpp.tools.Parser.extern(Parser.java:2523)
    at org.bytedeco.javacpp.tools.Parser.declarations(Parser.java:2596)
    at org.bytedeco.javacpp.tools.Parser.parse(Parser.java:2672)
    at org.bytedeco.javacpp.tools.Parser.parse(Parser.java:2769)
    at org.bytedeco.javacpp.tools.Builder.parse(Builder.java:68)
    at org.bytedeco.javacpp.tools.Builder.build(Builder.java:620)
    at org.bytedeco.javacpp.tools.Builder.main(Builder.java:769)
gwenn commented 8 years ago

A quick and dirty fix:

diff --git a/src/main/java/org/bytedeco/javacpp/tools/Parser.java b/src/main/java/org/bytedeco/javacpp/tools/Parser.java
index 9b3c8bc..2cfe4d1 100644
--- a/src/main/java/org/bytedeco/javacpp/tools/Parser.java
+++ b/src/main/java/org/bytedeco/javacpp/tools/Parser.java
@@ -1002,7 +1002,10 @@ public class Parser {
                 }
                 info = infoMap.getFirst(cppType += ")");

-                String functionType = Character.toUpperCase(originalName.charAt(0)) + originalName.substring(1);
+                String functionType = null;
+                if (originalName != null) {
+                    functionType = Character.toUpperCase(originalName.charAt(0)) + originalName.substring(1);
+                }
                 if (info != null && info.pointerTypes.length > 0) {
                     functionType = info.pointerTypes[0];
                 } else if (typedef) {

It seems that the error is related to the following line:

  void (*(*xDlSym)(sqlite3_vfs*,void*, const char *zSymbol))(void);
saudet commented 8 years ago

Thanks!! The parser is (not so) quick and (but quite) dirty anyway. (BTW, the first thing I'd like to try at this point for the parser is to see if we couldn't use Clang for that: #51)

saudet commented 8 years ago

In any case, if you get sqlite working, let me know and we'll start a "presets" out of that! Thanks

gwenn commented 8 years ago

I've created a branch here. But JavaCPP Builder fails: https://travis-ci.org/gwenn/sqlite-jna/builds/109200887 I'll try to fix it.

saudet commented 8 years ago

Thanks! I'll try to get a look at that sooner rather than later...

gwenn commented 8 years ago

Ok, https://travis-ci.org/gwenn/sqlite-jna/builds/109444251 JavaCPP Builder succeed. And only some tests related to callback are failing.

May I ask you two questions: a) Is there any way to tell JavaCPP that some SQLite functions expect UTF-8 encoded string (a not the default/current encoding) ? b) Is there any way to create a FunctionPointer with an address set to -1 (which means TRANSIENT for SQLITE) ? I tried:

    public static final Destructor TRANSIENT = new Destructor() {
        {
            this.address = -1; // SQLITE_TRANSIENT
        }

        @Override
        public void call(Pointer p) {
            throw new UnsupportedOperationException();
        }
    };

with no success. Thanks.

gwenn commented 8 years ago

Ok, Callbacks fixed: https://travis-ci.org/gwenn/sqlite-jna/builds/109658416

Regards.

saudet commented 8 years ago

Great, thanks! Are you by any chance comparing the overhead of each tool (JNA, JNR, etc)? It would be nice to have some benchmarks as reference.

For the function pointer, I was starting to consider adding support for native void allocate(int i) to FunctionPointer, but you're right, we can simply overload the method and use a cast for that:

void sqlite3_result_blob(sqlite3_context c, Pointer p, int i, FunctionPointer destructor);
void sqlite3_result_blob(sqlite3_context c, Pointer p, int i, @Cast("sqlite3_destructor_type") long destructor);

It's not type safe, but it isn't in C anyway. Does it still make sense to try to cram that into FunctionPointer?

As for the character encoding, String gets passed as "modified UTF-8", because that's what we get most efficiently with JNI. To get another encoding, we can use new BytePointer(String s, String charsetName) as a @Cast("const char*") BytePointer argument. We could probably add settings and stuff, but they would need to be dynamic in some way... If you'd like to try something, any experimentation would be welcome.

gwenn commented 8 years ago

Sorry, I have no benchmark between the different binding tools. Right now, I don't have access to www.ch-werner.de/javasqlite/ site but I think there is a benchmark included.

And SQLite don't like the "modified UTF-8" produced by JNI. So only the BytePointer can be used.

saudet commented 8 years ago

From what I see on your branches, it doesn't look like the other tools found a better way to pass strings in a different encoding, right?

BTW, if String.getBytes(Charset) is more efficient, we could add that to BytePointer for ease of use.

gwenn commented 8 years ago

It seems that both JNA and JNR use the correct UTF-8 encoding. https://github.com/java-native-access/jna/blob/51cab7599d40076af275fcfaffe838dfc35e51f2/src/com/sun/jna/Native.java#L98

saudet commented 8 years ago

Right, but we still need to allocate bytes on the heap. It doesn't do it in native code somehow.

Or are you saying there would be an advantage in setting the default encoding used by BytePointer? We could add a property for that, and do the same as JNA.

gwenn commented 8 years ago

Maybe I am biased but I want my program to preserve data integrity (data not corrupted by a bad encoding) even if there is an additional cost. I guess that many JNI bindings using GetStringUTFChars are buggy (including mine).

Something like that

static native void foo(@Encoding("UTF-8")String bar);

would be perfect.

gwenn commented 8 years ago

Some benchmarks using the BenchmarkDriver available here:

=====================sqlite-jna==========================
Driver: org.sqlite.driver.JDBC
URL:jdbc:sqlite:/Users/gwen/Java/sqlite-jna/bench.sqlite

Scale factor value: 1
Number of clients: 10
Number of transactions per client: 10

Initializing dataset...DBMS: SQLite 3

* Starting Benchmark Run *

* Benchmark Report *
* Featuring <direct queries> <auto-commit> 
--------------------
Time to execute 100 transactions: 0.485 seconds.
0 / 100 failed to complete.
Transaction rate: 206.18556701030928 txn/sec.

* Benchmark Report *
* Featuring <direct queries> <transactions> 
--------------------
Time to execute 100 transactions: 0.295 seconds.
0 / 100 failed to complete.
Transaction rate: 338.98305084745766 txn/sec.

* Benchmark Report *
* Featuring <prepared statements> <auto-commit> 
--------------------
Time to execute 100 transactions: 0.402 seconds.
0 / 100 failed to complete.
Transaction rate: 248.75621890547262 txn/sec.

* Benchmark Report *
* Featuring <prepared statements> <transactions> 
--------------------
Time to execute 100 transactions: 0.196 seconds.
0 / 100 failed to complete.
Transaction rate: 510.204081632653 txn/sec.
=====================sqlite-javacpp===========================
Driver: org.sqlite.driver.JDBC
URL:jdbc:sqlite:/Users/gwen/Java/sqlite-javacpp/bench.sqlite

Scale factor value: 1
Number of clients: 10
Number of transactions per client: 10

Initializing dataset...DBMS: SQLite 3

* Starting Benchmark Run *

* Benchmark Report *
* Featuring <direct queries> <auto-commit> 
--------------------
Time to execute 100 transactions: 0.423 seconds.
1 / 100 failed to complete.
Transaction rate: 234.04255319148936 txn/sec.

* Benchmark Report *
* Featuring <direct queries> <transactions> 
--------------------
Time to execute 100 transactions: 0.274 seconds.
0 / 100 failed to complete.
Transaction rate: 364.963503649635 txn/sec.

* Benchmark Report *
* Featuring <prepared statements> <auto-commit> 
--------------------
Time to execute 100 transactions: 0.317 seconds.
0 / 100 failed to complete.
Transaction rate: 315.45741324921136 txn/sec.

* Benchmark Report *
* Featuring <prepared statements> <transactions> 
--------------------
Time to execute 100 transactions: 0.159 seconds.
0 / 100 failed to complete.
Transaction rate: 628.9308176100628 txn/sec.
==================sqlite-jni===============================
Driver: org.sqlite.driver.JDBC
URL:jdbc:sqlite:/Users/gwen/Java/sqlite-jni/bench.sqlite

Scale factor value: 1
Number of clients: 10
Number of transactions per client: 10

Initializing dataset...DBMS: SQLite 3

* Starting Benchmark Run *

* Benchmark Report *
* Featuring <direct queries> <auto-commit> 
--------------------
Time to execute 100 transactions: 0.399 seconds.
0 / 100 failed to complete.
Transaction rate: 250.62656641604008 txn/sec.

* Benchmark Report *
* Featuring <direct queries> <transactions> 
--------------------
Time to execute 100 transactions: 0.28 seconds.
0 / 100 failed to complete.
Transaction rate: 357.1428571428571 txn/sec.

* Benchmark Report *
* Featuring <prepared statements> <auto-commit> 
--------------------
Time to execute 100 transactions: 0.325 seconds.
0 / 100 failed to complete.
Transaction rate: 307.6923076923077 txn/sec.

* Benchmark Report *
* Featuring <prepared statements> <transactions> 
--------------------
Time to execute 100 transactions: 0.175 seconds.
0 / 100 failed to complete.
Transaction rate: 571.4285714285714 txn/sec.
Java version: 1.8.0_60, vendor: Oracle Corporation
OS name: "mac os x", version: "10.11.3", arch: "x86_64", family: "mac"
saudet commented 8 years ago

So, the wrapper based on JavaCPP is on average faster than the custom JNI? Cool :) Thanks for taking the time!

Of course, I understand the need to use standard UTF-8, but it's not just about UTF-8. Specifying the encoding info in an annotation like that prevents the end user from using any other encodings. In the general case, not in the case of SQLite, we often need to be able to set it at runtime, don't we?

gwenn commented 8 years ago

There are other libraries supporting only utf-8:

xmlChar, the libxml2 data type is a byte, those bytes must be assembled as UTF-8 valid strings.

Jansson uses UTF-8 as the character encoding. All JSON strings must be valid UTF-8

saudet commented 8 years ago

Ah, why did the JDK go with "modified UTF-8"...

saudet commented 8 years ago

BTW, if you make a blog post or something about your experience using JNA, JNR, JNI, JavaCPP, BridJ, etc let me know! Thanks

saudet commented 8 years ago

About UTF-8 encoding, I've switched to String.getBytes() and new String(byte[]) by default in the latest commit, to have the default charset used. This is usually UTF-8 on Linux, but we can launch the JVM with the -Dfile.encoding=UTF-8 option to make sure that is what gets selected.

The calls are a bit slow, about 300 ns, but GetStringUTFChars() isn't exactly fast either (about 100 ns), so I think it's a good trade-off. Things are more consistent that way.

Let me know if you encounter any problems with that! Thanks for all the tests you've done :)

gwenn commented 8 years ago

Sorry, but for me, it is useless: I don't want to use/change a global state/variable to say that sqlite3_bind_text expects UTF-8.

saudet commented 8 years ago

Well, ok, say we had an annotation and we could write something like the following:

static native void foo(@Encoding("UTF-8") String bar);

Other than saving a couple of keystrokes (that we could save using the parser anyway) what is the rationale for not doing it the way we already can as below?

static native void foo(BytePointer bar);
static void foo(String bar) { foo(new BytePointer(bar, "UTF-8")); }

Where we could wrap new BytePointer(bar, "UTF-8") in a helper method BTW. Also, sometimes, we can't get String objects from the native API, only BytePointer, so usually we do need overloads with BytePointer anyway.

In any case, if it's important enough, please do send a pull request, but unless I understand the benefit it's not something I'm going to take time to implement and maintain myself.

saudet commented 8 years ago

I've just had an idea though. We could hard code the encoding used for all String passed to and received from a given library via a macro like this @Platform(define = "STRING_BYTES_ENCODING UTF-8", ...). Now that's something that would make sense to me... What do you think?

gwenn commented 8 years ago

Good idea. And I only use the UTF-8 methods in my binding. But SQLite supports two encodings by default: UTF-8 (sqlite3_bind_text) and UTF-16 (sqlite3_bind_text16)... So your solution works in my case but not in all cases.

saudet commented 8 years ago

Right, but you have to understand that in general there isn't any good options with standard Java types like primitive arrays, String or even direct NIO buffers. For example, we can't resize a ByteBuffer or modify the content of a String, so when we need to support a native function with an output argument like that, it's just easier (and more efficient than whatever hack we can come up with) to just use BytePointer (or CharPointer for UTF-16 or IntPointer for UTF-32). There are other things Java can't do, like arrays with more than Integer.MAX_VALUE elements. We could go on and on, but the ultimate solution to everything remains the same: Use an appropriate subclass of Pointer. :)

saudet commented 3 years ago

There is a good reason to provide UTF-16 support through JNI functions though: performance, see issue https://github.com/bytedeco/javacpp/pull/442#issuecomment-748492293.

Thanks to @equeim, we should have this implemented soon! (Including "modified UTF-8" while we're at it.)

saudet commented 3 years ago

Ok, it's done! Thanks to @equeim and his contribution in pull #460, we can now map String arguments to, for example, UTF-8 by annotating a class with something like @Platform(define = "STRING_BYTES_CHARSET \"UTF-8\", ...), while we can still use @AsUtf16 on individual String parameters to get fast access to the underlying UTF-16 code units.

saudet commented 3 years ago

BTW, we may want to update the guide here with a new section for that: https://github.com/bytedeco/javacpp/wiki/Mapping-Recipes

gwenn commented 3 years ago

Thanks. I will try to use sqlite3.*16 functions with @AsUtf16 annotation where possible. In these cases, I guess I can get rid of @Cast("const char*") BytePointer and use a String directly.

saudet commented 3 years ago

These changes have been released with JavaCPP 1.5.5. Enjoy!

Thanks. I will try to use sqlite3.*16 functions with @AsUtf16 annotation where possible. In these cases, I guess I can get rid of @Cast("const char*") BytePointer and use a String directly.

You can also provide overloads with CharPointer to allow passing UTF-16 data to and from off-heap memory as well.