Closed filipnavara closed 1 year ago
Er, my bad.
Looks like I can just do something like:
[UnmanagedCallersOnly(CallConvs = new[] { typeof(CallConvCdecl) })]
I expect that to be the solution but I didn't get to try it yet.
FWIW, version 2.1.1-pre20220822172036 contains this fix. If there are no problems with it, I will publish it as a non-pre release next week. Reopening this issue until I ship a non-pre-release containing its fix.
AFAICT, the fix made in #507 was intended to address a problem that is Windows-specific, but it has caused iOS platforms to fail completely during AOT, probably due to a bug in the Mono AOT compiler. That bug has been logged at dotnet/runtime#75700.
What probably needs to happen here is to make the necessary fix to calling convention be Windows-specific.
In the meantime, the fix is a case where the cure is worse than the disease. I plan to revert, so I am reopening this bug.
That's quite unfortunate since that will make it unusable for us on Windows again :-/ I will try to see if I can expedite the Mono fix.
@filipnavara Aren't you on iOS as well? I mean, the change fixes Windows, but it completely breaks iOS. I sort of expected you to be affected on both sides. If the change somehow did not break iOS builds for you, then I'm misunderstanding something.
I'll try today to rework your fix to take place on Windows only. The T4 stuff already generates several variants of things, so it's not too hard to isolate what you need.
It's complicated...
We are on legacy Xamarin.iOS which uses different flavor of SQLitePCL.raw that doesn't suffer from the issue.
For a moment we were running some forked builds but I really want to avoid that if possible. First, we could not switch to official builds because we needed the macOS fix for ARM64 varargs calling convention. Once that shipped we could not upgrade because Windows was blocked by this issue (builds but crashes at runtime). Now we are basically on 2.1.1 with our fork abandoned.
We do also have internal builds on net6.0-ios but they are not up to date with the X.iOS branches due to miscellaneous issues so we would not hit the SQLitePCL.raw breakage yet.
Ah. That makes sense. Yes, quite "complicated".
I'm working now on reworking your fix to avoid the net6.0-ios problem.
Thanks a lot. I really appreciate it.
@filipnavara Are you using the winsqlite3 provider?
(I'm pretty sure you're not.)
2.1.2-pre20220916165053 is available on nuget and contains the revised version of the fix for this issue.
@filipnavara One other question about this issue:
There is a test case for sqlite3_config_log
:
but for some reason, it didn't fail back when I made the switch to use UnmanagedCallersOnly
. It should have caught the failure you have been seeing, but it did not.
I have verified that this test is being included in test runs, and that it passes on Windows.
Any idea how to modify that test (or add another one) to make a verification for this issue?
Are you using the winsqlite3 provider?
We use the e_sqlite3 one.
Hmm, I'll have to check about the test. I remember it called the callback and trashed the stack after. It would be interesting to see what happens if sqlite3_log
is called twice.
FWIW, I tried modifying the test to call sqlite3_log
multiples times, but I still couldn't get it to fail (with the CallConv stuff disabled).
I committed the changes to that test at b166926, but I messed up the commit message and referenced the wrong bug number instead of this one.
One more thought, do the tests run only as 64-bit target or does it test win-x86 too? We observed it on 32-bit process which actually has different calling conventions.
Ah, 32-bit. That's probably the difference. The test suites have not been running with win-x86.
And in fact, they don't pass. In fact, there are multiple problems, and even though your calling convention fix is apparently sufficient for your situation, it does not seem to be enough to get all my tests to pass.
So, I've apparently got some work to do for 32-bit.
When I release 2.1.2, I'll probably close this issue and open new one(s) for 32-bit problems.
In retrospective that makes sense 😅 x86 is the weird one with three major calling conventions, x64 and arm64 both use only one (which differs per platform but not per method).
@filipnavara So... for your Windows builds, you're using a target framework of net6.0-windows
(or similar), rather than just net6.0
, right?
When I tweaked your calling convention fix to be effective only on Windows, I used some existing stuff, and so that version of the provider only gets used for a net6.0-windows
TFM. So, above, when I said I tried running those fixes on win-x86
, and found that lots of things were broken, I was building with net6.0
, so the calling convention fixes were not actually in play. But, if I run the test suite targeting net6.0-windows
, everything passes.
I might need to tweak the nuspec to make that version of the provider assembly match net6.0
when the RID is win-x86
.
So... for your Windows builds, you're using a target framework of net6.0-windows (or similar), rather than just net6.0, right?
Correct. net6.0-windows
TFM and win-x86
RID for the executable project.
Fixed in 2.1.2, which is now on nuget. Also logged #518 as a leftover.
We are using the
sqlite3_config_log
API. Apparently somewhere between 2.0.7 and 2.1.0 you switched to useUnmanagedCallersOnly
. Unfortunately, this means the decorated callback now defaults to the platform calling convention which isstdcall
on Windows. The native library (e_sqlite3
) expectscdecl
though. This mismatch causes access violation once the registered callbacks returns back to the native code.