Open lidavidm opened 5 months ago
So while this repro forces symptoms close to what was reported, CPython zero-initializes Python objects so there should be no way the ArrowSchemaHandle has an uninitialized struct inside it
So there may be another place where we are allocating and not initializing a struct that gets passed to Go in the code; or there may be a problem with the FFI or C Data code within the driver/Arrow-Go itself
Here's another potential issue: we use C.malloc
at a few points to allocate a struct that we fill. Well, malloc
doesn't initialize the memory so perhaps it's possible a Go-pointer-alike sneaks in there.
This diff is also able to force a crash:
diff --git a/go/adbc/pkg/flightsql/driver.go b/go/adbc/pkg/flightsql/driver.go
index 9a8535fe..cb3810ab 100644
--- a/go/adbc/pkg/flightsql/driver.go
+++ b/go/adbc/pkg/flightsql/driver.go
@@ -17,8 +17,6 @@
// specific language governing permissions and limitations
// under the License.
-//go:build driverlib
-
package main
// ADBC_EXPORTING is required on Windows, or else the symbols
@@ -46,6 +44,7 @@ package main
// int FlightSQLArrayStreamGetNextTrampoline(struct ArrowArrayStream*, struct ArrowArray*);
//
// void releasePartitions(struct AdbcPartitions* partitions);
+// void* allocError();
//
import "C"
import (
@@ -102,7 +101,8 @@ func setErrWithDetails(err *C.struct_AdbcError, adbcError adbc.Error) {
return
}
- cErrPtr := C.malloc(C.sizeof_struct_FlightSQLError)
+ // cErrPtr := C.malloc(C.sizeof_struct_FlightSQLError)
+ cErrPtr := C.allocError()
cErr := (*C.struct_FlightSQLError)(cErrPtr)
cErr.message = C.CString(adbcError.Msg)
err.message = cErr.message
diff --git a/go/adbc/pkg/flightsql/utils.c b/go/adbc/pkg/flightsql/utils.c
index 95920aa4..d84e1e79 100644
--- a/go/adbc/pkg/flightsql/utils.c
+++ b/go/adbc/pkg/flightsql/utils.c
@@ -24,6 +24,7 @@
#include "utils.h"
#include <string.h>
+#include <stdio.h>
#ifdef __cplusplus
extern "C" {
@@ -440,6 +441,16 @@ int FlightSQLArrayStreamGetNextTrampoline(struct ArrowArrayStream* stream,
return FlightSQLArrayStreamGetNext(stream, out);
}
+ void* allocError(void) {
+ struct FlightSQLError* error = (struct FlightSQLError*)malloc(sizeof(struct FlightSQLError));
+ uintptr_t bad = 0xc000000000;
+ bad += rand() % 0xFFFFFF;
+ error->lengths = (void*)bad;
+ error->message = (void*)bad;
+ printf("tester2 %#010lx\n", bad);
+ return error;
+}
+
#ifdef __cplusplus
}
#endif
diff --git a/go/adbc/pkg/flightsql/utils.h b/go/adbc/pkg/flightsql/utils.h
index fbdbe89a..090b471d 100644
--- a/go/adbc/pkg/flightsql/utils.h
+++ b/go/adbc/pkg/flightsql/utils.h
@@ -179,3 +179,5 @@ int FlightSQLArrayStreamGetSchemaTrampoline(struct ArrowArrayStream* stream,
struct ArrowSchema* out);
int FlightSQLArrayStreamGetNextTrampoline(struct ArrowArrayStream* stream,
struct ArrowArray* out);
+
+void* allocError(void);
If we accept this as the potential cause, we should audit upstream as well. I see a couple of similar patterns in the C Data Interface implementation.
I think we should also audit all cgo definitions and helper C code to look for anything potentially stack allocated or otherwise uninitialized.
What happened?
https://github.com/apache/arrow-adbc/issues/729 is still possible because there are other arguments that need to be sanitized before passing on to go.
The gist is the same as earlier, but this time the GC write barrier is not involved. Instead the GC can run and eventually discovers/marks an invalid Go pointer just as part of its normal operation. Hence the symptoms look slightly different.
How can we reproduce the bug?
Unfortunately, the reproduction is private, but you can patch the driver manager to force the bug to happen
Environment/Setup
Linux x64; ADBC (Python) 0.10.0 and 1.0.0 were tested and it seems all versions should exhibit this