dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.17k stars 4.72k forks source link

.NET 6.0.0-rc.2 on Raspberry Pi cannot write to writable network file #61175

Closed sba923 closed 2 years ago

sba923 commented 2 years ago

This is a follow-up to https://github.com/dotnet/runtime/issues/53182

While testing PowerShell 7.2.0-rc.1 which is built atop .NET 6.0.0-rc.2 I was able to repro the issue described in https://github.com/PowerShell/PowerShell/issues/15339

Digging further, I was able to repro the underlying .NET issue, using @adamsitnik's code in https://github.com/dotnet/runtime/issues/53182#issuecomment-874101032

@SteveL-MSFT suggested I followed up here, so here I am 😋

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.

Issue Details
This is a follow-up to https://github.com/dotnet/runtime/issues/53182 While testing PowerShell 7.2.0-rc.1 which is built atop .NET 6.0.0-rc.2 I was able to repro the issue described in https://github.com/PowerShell/PowerShell/issues/15339 Digging further, I was able to repro the underlying .NET issue, using @adamsitnik's code in https://github.com/dotnet/runtime/issues/53182#issuecomment-874101032 @SteveL-MSFT [suggested](https://twitter.com/Steve_MSFT/status/1455329890580905984?s=20) followed up here, so here I am 😋
Author: sba923
Assignees: -
Labels: `area-System.IO`, `untriaged`
Milestone: -
danmoseley commented 2 years ago

@sba923 thanks for the report. IO folks will take a look but meantime, in case they need the info, could you get the file system type?

Looks like the code in https://github.com/dotnet/runtime/issues/53182#issuecomment-875523336 will do it.

adamsitnik commented 2 years ago

This is very unexpected.

In the past you have provided (https://github.com/dotnet/runtime/issues/53182#issuecomment-876198413) the File System ID: FE534D42 It's smb2 in our enum: https://github.com/dotnet/runtime/blob/4822e3c3aa77eb82b2fb33c9321f923cf11ddde6/src/libraries/Common/src/Interop/Unix/System.Native/Interop.UnixFileSystemTypes.cs#L120 It's now on the list of File Systems for which we don't acquire shared file lock when the file is opened with writing intent: https://github.com/dotnet/runtime/blob/4822e3c3aa77eb82b2fb33c9321f923cf11ddde6/src/libraries/System.Private.CoreLib/src/Microsoft/Win32/SafeHandles/SafeFileHandle.Unix.cs#L393-L399

and if we can't get this information, we don't acquire the lock: https://github.com/dotnet/runtime/blob/4822e3c3aa77eb82b2fb33c9321f923cf11ddde6/src/libraries/System.Private.CoreLib/src/Microsoft/Win32/SafeHandles/SafeFileHandle.Unix.cs#L388-L390

So in theory, it should not happen.

@sba923 could you please do the following:

  1. Share full stack trace
  2. Set DOTNET_SYSTEM_IO_DISABLEFILELOCKING env var to "true" and see if you can repro it using PowerShell? (it disables all kinds of file locks for all file systems)
  3. Try to run the repro code again and share the File System ID: https://github.com/dotnet/runtime/issues/53182#issuecomment-875523336

@sba923 once again big thanks for all your help!!

sba923 commented 2 years ago

No problemo. But today I'm not WFH so this'll have to wait until I'm back home.

danmoseley commented 2 years ago

Also perhaps use a debugger or procmon or other simple tool to verify which .NET is getting loaded and check the version stamp of that and/or use ilspy to verify it contains the fix for #53182

sba923 commented 2 years ago

Will reach home later than 10pm so will probably postpone testing to the weekend, sorry for that.

sba923 commented 2 years ago

This is very unexpected.

  1. Here's the outcome of the PowerShell test:

image

  1. The filesystem ID is the expected one:

image

sba923 commented 2 years ago

This is very unexpected.

  1. Is this enough? image Program.zip
adamsitnik commented 2 years ago

Setting DOTNET_SYSTEM_IO_DISABLEFILELOCKING to true worked as expected, so it means that you are using a .NET that should work just fine without the env var.

@sba923 Could you please run the powershell command again (without the magic env var) using strace and share the strace output? Example how to use it: https://unix.stackexchange.com/a/158331. I would like to see what fstatfs is returning as I can't understand why we still acquire the lock..

sba923 commented 2 years ago
strace -o ./pwsh_repro_strace.log ./pwsh -noprofile -c '& { "foo" | out-file /pnjnas/sto/disks/Health/foo.txt }'

gives me this:

pwsh_repro_strace.zip

but I can't see a call to fstatfs in there, and also not anything related to the output test file...

What am I missing?

sba923 commented 2 years ago

Of course, the problem repro's with RTW PowerShell 7.2.0 😥

sba923 commented 2 years ago

And with PowerShell 7.2.1.

@adamsitnik Where do we go from here?

sba923 commented 2 years ago

Issue is still present in PowerShell 7.3.0-preview.1

sba923 commented 2 years ago

@adamsitnik How can I help with this issue?

adamsitnik commented 2 years ago

but I can't see a call to fstatfs in there, and also not anything related to the output test file...

Most likely PowerShell starts a new process that loads and executes the .NET code and we get the trace of the former.

Could you please create a new .NET app:

dotnet new console -o Repro

modify the content of ./Repro/Program.cs:

using System.IO;

namespace Repro
{
    internal class Program
    {
        static void Main(string[] args) => File.WriteAllText(args[0], "Hello World!");
    }
}

publish it using the following command

dotnet publish -c Release --self-contained

and use strace to trace the published executable? You should be able to find the executable in .\bin\Release\net7.0\lin-x64\publish\Repro, it expects the first argument to be a path to the file

sba923 commented 2 years ago

Here's what I get with 6.0.300

PS> /tmp/dotnet/6.0.300/dotnet publish -c Release --self-contained --runtime linux-arm
PS>  file ./bin/Release/net6.0/linux-arm/publish/Repro
./bin/Release/net6.0/linux-arm/publish/Repro: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 3.2.0, BuildID[sha1]=f89d1423b0b63feadbce24af319bd6b09d6ce73b, stripped
PS> strace ./bin/Release/net6.0/linux-arm/publish/Repro /pnjnas/sto/disks/Health/foo.txt 2>&1 | out-file /tmp/strace.txt

strace.txt

adamsitnik commented 2 years ago

@sba923 please excuse me for asking for so much output, but the output provided by strace does not contain the full answer to my question. Could you please run following hacky app with the path to the file as argument and share the output?

using System.IO;
using System.Reflection;

namespace Repro
{
    internal class Program
    {
        static void Main(string[] args)
        {
            using FileStream fs = File.Create(args[0]);
            var sfh = fs.SafeFileHandle;
            var type = typeof(object).Assembly.GetType("Interop+Sys");
            var method = type.GetMethod("GetFileSystemType", BindingFlags.Static | BindingFlags.NonPublic);
            long result = (long)method.Invoke(null, new object[]{ sfh });
            Console.WriteLine(result.ToString("X"));
        }
    }
}
dotnet run
sba923 commented 2 years ago

No need to ask for an excuse 😆 just ask whatever you need to dig further into this issue!

Here you go:

image

adamsitnik commented 2 years ago

@sba923 big thanks! Now I know why the fix provided in #55256 does not work. For some reason, fstatfs initialized the provided structure with invalid value. We expect 0xFE534d42 as documented in the man page, but in reality we get 0xFFFFFFFFFE534D42.

https://github.com/dotnet/runtime/blob/0332c6472a0007f4adb5c014474b8eeeabea73a5/src/native/libs/System.Native/pal_io.c#L1487-L1488

It's not mapped properly to smb2 enum in the managed code, hence the condition is not met and lock is being taken.

https://github.com/dotnet/runtime/blob/4822e3c3aa77eb82b2fb33c9321f923cf11ddde6/src/libraries/System.Private.CoreLib/src/Microsoft/Win32/SafeHandles/SafeFileHandle.Unix.cs#L393-L399

We are not the first ones to hit this bug: https://github.com/qbittorrent/qBittorrent/issues/10367#issuecomment-471415003

The problem is that it seems to have not been solved yet: https://mail.gnu.org/archive/html/bug-coreutils/2019-03/msg00025.html

@tmds @janvorli what would you recommend in this situation?

danmoseley commented 2 years ago

Does this comment in the same man page suggest we should disregard the upper bytes anyway?

The __fsword_t type used for various fields in the statfs structure definition is a glibc internal type, not intended for public use. This leaves the programmer in a bit of a conundrum when trying to copy or compare these fields to local variables in a program. Using unsigned int for such variables suffices on most systems.

sba923 commented 2 years ago

Does this comment in the same man page suggest we should disregard the upper bytes anyway?

The __fsword_t type used for various fields in the statfs structure definition is a glibc internal type, not intended for public use. This leaves the programmer in a bit of a conundrum when trying to copy or compare these fields to local variables in a program. Using unsigned int for such variables suffices on most systems.

This also means we rely on undocumented / internal stuff...

Is there a documented way of determining the filesystem type?

Probably not, as user code is not supposed to behave differently depending on the underlying filesystem's type...

danmoseley commented 2 years ago

Probably not, as user code is not supposed to behave differently depending on the underlying filesystem's type...

Why not? Presumably I ought to be able to eg display the file system type in my app.

sba923 commented 2 years ago

I presume the fix will be backported to 6.0, so that we get a PowerShell 7.2.x with the fix (re: https://github.com/PowerShell/PowerShell/issues/15339)?

sba923 commented 2 years ago

Probably not, as user code is not supposed to behave differently depending on the underlying filesystem's type...

Why not? Presumably I ought to be able to eg display the file system type in my app.

This would be a rather special use case if you ask me.

My assumption was that for most apps targeting "normal users", the behavior should be the same whatever the underlying filesystem.

adamsitnik commented 2 years ago

I presume the fix will be backported to 6.0, so that we get a PowerShell 7.2.x with the fix

We are going to publish the fix in .NET 7 Preview 5, let the users use it for a while (a month?) and then backport it to 6.0.

sba923 commented 2 years ago

I presume the fix will be backported to 6.0, so that we get a PowerShell 7.2.x with the fix

We are going to publish the fix in .NET 7 Preview 5, let the users use it for a while (a month?) and then backport it to 6.0.

Great. I'll check on the PowerShell side of things, probably there will be a 7.3.x preview consuming .NET 7 Preview 5... at some point.

I'll test the fix both in .NET and PowerShell lands.

tmds commented 2 years ago

We expect 0xFE534d42 as documented in the man page, but in reality we get 0xFFFFFFFFFE534D42.

I'm curious. Can we reproduce this with a native program?

main.c:

#define _LARGEFILE64_SOURCE

#include <sys/vfs.h>
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>

int main(int argc, char* argv[])
{
    if (argc < 2)
        return -1;
    int fd = open(argv[1], O_RDONLY|O_LARGEFILE);
    struct statfs fs;
    int r = fstatfs(fd, &fs);
    if (r == 0)
    {
        printf("f_type = 0x%llx\n", (long long)fs.f_type);
    }
    return r == 0 ? 0 : errno;
}

@sba923 can you compile this on your rpi using gcc main.c. That will produce an executable named a.out.

You can pass it a filename and it will print the file system type. For example:

$ gcc main.c 
$ touch /tmp/file
$ ./a.out /tmp/file
f_type = 0x1021994

What output do you get if you point it to an smb file?

sba923 commented 2 years ago

I get:

image

sba923 commented 2 years ago

FWIW, this is a Raspberry Pi 2 Model B Rev 1.1 running Raspbian "buster" (10), and the SMB server is Ubuntu 20.04.4 LTS running Samba Version 4.13.17-Ubuntu

tmds commented 2 years ago

Your OS is 32-bit. On your OS, f_type is an int32, so 0xFE534d42 is a negative number. And when represented as int64 that negative number becomes 0xFFFFFFFFFE534D42.

sba923 commented 2 years ago

QED

So there are IMVHO two equivalent ways to fix the comparison of f_type against the magic values (list e.g. in https://linux.die.net/man/2/fstatfs):

  1. compare the lowest 32 bits of f_type against the magic values, that's what is implemented in https://github.com/dotnet/runtime/pull/69484
  2. compare f_type against the magic values cast (sign-extended) to int64 but that would basically require to change the type of the magic values in the enum.

Intuitively I would think one is probably very marginally faster than the other, but 1. is simpler, that's probably why this was chosen. @adamsitnik do you concur?

(probably f_type is 64 bits on 64-bit systems only for alignment purposes; just my guess)

sba923 commented 2 years ago

BTW, when / how / why was this regression introduced?

How come 64-bit types are being used in that context on that 32-bit OS in the first place?

Just curious....

tmds commented 2 years ago

BTW, when / how / why was this regression introduced?

It's not a regression. The improvement made in https://github.com/dotnet/runtime/pull/55256 for .NET 6 has an issue on 32-bit Linux.

How come 64-bit types are being used in that context on that 32-bit OS in the first place?

.NET doesn't directly call platform functions, it does that through a native library. The native library abstracts platform differences, like using fixed-size types that are large enough to work on any platform.

sba923 commented 2 years ago

It's not a regression

From the original PowerShell standpoint, it is. Writing to those network files used to work.

tmds commented 2 years ago

From the original PowerShell standpoint, it is. Writing to those network files used to work.

With what version of .NET (Core)?

Before .NET 6 the locks were taken unconditionally. So the issue should have always occurred.

sba923 commented 2 years ago

I opened https://github.com/PowerShell/PowerShell/issues/15339 for PowerShell 7.1.3, which was using .NET 5.0.4.

I would have to perform some tests to determine which earlier PowerShell versions didn't exhibit the problem.

I don't have a table of PowerShell -> .NET version mapping handy, is there one @SteveL-MSFT?

sba923 commented 2 years ago

I can repro with PowerShell 7.0.0, which used .NET Core 3.1.2

image

I can't run PowerShell 6.1.2 anymore on that system:

image

sba923 commented 2 years ago

@SteveL-MSFT What's the best way to monitor for availability of the fix in PowerShell 7.2 LTS (based on .NET 6) and PowerShell 7.3 (based on .NET 7)?

adamsitnik commented 2 years ago

Status: as soon as PWS 7.3.0-preview.6 is released and @sba923 confirms that the fix has solved the problem in https://github.com/PowerShell/PowerShell/issues/15339, I am going to backport this to 6.0.