Closed sba923 closed 2 years ago
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.
Author: | sba923 |
---|---|
Assignees: | - |
Labels: | `area-System.IO`, `untriaged` |
Milestone: | - |
@sba923 thanks for the report. IO folks will take a look but meantime, in case they need the info, could you get the file system type?
Looks like the code in https://github.com/dotnet/runtime/issues/53182#issuecomment-875523336 will do it.
This is very unexpected.
In the past you have provided (https://github.com/dotnet/runtime/issues/53182#issuecomment-876198413) the File System ID: FE534D42 It's smb2 in our enum: https://github.com/dotnet/runtime/blob/4822e3c3aa77eb82b2fb33c9321f923cf11ddde6/src/libraries/Common/src/Interop/Unix/System.Native/Interop.UnixFileSystemTypes.cs#L120 It's now on the list of File Systems for which we don't acquire shared file lock when the file is opened with writing intent: https://github.com/dotnet/runtime/blob/4822e3c3aa77eb82b2fb33c9321f923cf11ddde6/src/libraries/System.Private.CoreLib/src/Microsoft/Win32/SafeHandles/SafeFileHandle.Unix.cs#L393-L399
and if we can't get this information, we don't acquire the lock: https://github.com/dotnet/runtime/blob/4822e3c3aa77eb82b2fb33c9321f923cf11ddde6/src/libraries/System.Private.CoreLib/src/Microsoft/Win32/SafeHandles/SafeFileHandle.Unix.cs#L388-L390
So in theory, it should not happen.
@sba923 could you please do the following:
DOTNET_SYSTEM_IO_DISABLEFILELOCKING
env var to "true" and see if you can repro it using PowerShell? (it disables all kinds of file locks for all file systems)@sba923 once again big thanks for all your help!!
No problemo. But today I'm not WFH so this'll have to wait until I'm back home.
Also perhaps use a debugger or procmon or other simple tool to verify which .NET is getting loaded and check the version stamp of that and/or use ilspy to verify it contains the fix for #53182
Will reach home later than 10pm so will probably postpone testing to the weekend, sorry for that.
This is very unexpected.
This is very unexpected.
Setting DOTNET_SYSTEM_IO_DISABLEFILELOCKING
to true
worked as expected, so it means that you are using a .NET that should work just fine without the env var.
@sba923 Could you please run the powershell command again (without the magic env var) using strace
and share the strace
output? Example how to use it: https://unix.stackexchange.com/a/158331. I would like to see what fstatfs
is returning as I can't understand why we still acquire the lock..
strace -o ./pwsh_repro_strace.log ./pwsh -noprofile -c '& { "foo" | out-file /pnjnas/sto/disks/Health/foo.txt }'
gives me this:
but I can't see a call to fstatfs
in there, and also not anything related to the output test file...
What am I missing?
Of course, the problem repro's with RTW PowerShell 7.2.0 😥
And with PowerShell 7.2.1.
@adamsitnik Where do we go from here?
Issue is still present in PowerShell 7.3.0-preview.1
@adamsitnik How can I help with this issue?
but I can't see a call to fstatfs in there, and also not anything related to the output test file...
Most likely PowerShell starts a new process that loads and executes the .NET code and we get the trace of the former.
Could you please create a new .NET app:
dotnet new console -o Repro
modify the content of ./Repro/Program.cs
:
using System.IO;
namespace Repro
{
internal class Program
{
static void Main(string[] args) => File.WriteAllText(args[0], "Hello World!");
}
}
publish it using the following command
dotnet publish -c Release --self-contained
and use strace
to trace the published executable? You should be able to find the executable in .\bin\Release\net7.0\lin-x64\publish\Repro
, it expects the first argument to be a path to the file
Here's what I get with 6.0.300
PS> /tmp/dotnet/6.0.300/dotnet publish -c Release --self-contained --runtime linux-arm
PS> file ./bin/Release/net6.0/linux-arm/publish/Repro
./bin/Release/net6.0/linux-arm/publish/Repro: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 3.2.0, BuildID[sha1]=f89d1423b0b63feadbce24af319bd6b09d6ce73b, stripped
PS> strace ./bin/Release/net6.0/linux-arm/publish/Repro /pnjnas/sto/disks/Health/foo.txt 2>&1 | out-file /tmp/strace.txt
@sba923 please excuse me for asking for so much output, but the output provided by strace does not contain the full answer to my question. Could you please run following hacky app with the path to the file as argument and share the output?
using System.IO;
using System.Reflection;
namespace Repro
{
internal class Program
{
static void Main(string[] args)
{
using FileStream fs = File.Create(args[0]);
var sfh = fs.SafeFileHandle;
var type = typeof(object).Assembly.GetType("Interop+Sys");
var method = type.GetMethod("GetFileSystemType", BindingFlags.Static | BindingFlags.NonPublic);
long result = (long)method.Invoke(null, new object[]{ sfh });
Console.WriteLine(result.ToString("X"));
}
}
}
dotnet run
No need to ask for an excuse 😆 just ask whatever you need to dig further into this issue!
Here you go:
@sba923 big thanks! Now I know why the fix provided in #55256 does not work. For some reason, fstatfs
initialized the provided structure with invalid value. We expect 0xFE534d42
as documented in the man page, but in reality we get 0xFFFFFFFFFE534D42
.
It's not mapped properly to smb2 enum in the managed code, hence the condition is not met and lock is being taken.
We are not the first ones to hit this bug: https://github.com/qbittorrent/qBittorrent/issues/10367#issuecomment-471415003
The problem is that it seems to have not been solved yet: https://mail.gnu.org/archive/html/bug-coreutils/2019-03/msg00025.html
@tmds @janvorli what would you recommend in this situation?
Does this comment in the same man page suggest we should disregard the upper bytes anyway?
The __fsword_t type used for various fields in the statfs structure definition is a glibc internal type, not intended for public use. This leaves the programmer in a bit of a conundrum when trying to copy or compare these fields to local variables in a program. Using unsigned int for such variables suffices on most systems.
Does this comment in the same man page suggest we should disregard the upper bytes anyway?
The __fsword_t type used for various fields in the statfs structure definition is a glibc internal type, not intended for public use. This leaves the programmer in a bit of a conundrum when trying to copy or compare these fields to local variables in a program. Using unsigned int for such variables suffices on most systems.
This also means we rely on undocumented / internal stuff...
Is there a documented way of determining the filesystem type?
Probably not, as user code is not supposed to behave differently depending on the underlying filesystem's type...
Probably not, as user code is not supposed to behave differently depending on the underlying filesystem's type...
Why not? Presumably I ought to be able to eg display the file system type in my app.
I presume the fix will be backported to 6.0, so that we get a PowerShell 7.2.x with the fix (re: https://github.com/PowerShell/PowerShell/issues/15339)?
Probably not, as user code is not supposed to behave differently depending on the underlying filesystem's type...
Why not? Presumably I ought to be able to eg display the file system type in my app.
This would be a rather special use case if you ask me.
My assumption was that for most apps targeting "normal users", the behavior should be the same whatever the underlying filesystem.
I presume the fix will be backported to 6.0, so that we get a PowerShell 7.2.x with the fix
We are going to publish the fix in .NET 7 Preview 5, let the users use it for a while (a month?) and then backport it to 6.0.
I presume the fix will be backported to 6.0, so that we get a PowerShell 7.2.x with the fix
We are going to publish the fix in .NET 7 Preview 5, let the users use it for a while (a month?) and then backport it to 6.0.
Great. I'll check on the PowerShell side of things, probably there will be a 7.3.x preview consuming .NET 7 Preview 5... at some point.
I'll test the fix both in .NET and PowerShell lands.
We expect 0xFE534d42 as documented in the man page, but in reality we get 0xFFFFFFFFFE534D42.
I'm curious. Can we reproduce this with a native program?
main.c:
#define _LARGEFILE64_SOURCE
#include <sys/vfs.h>
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
int main(int argc, char* argv[])
{
if (argc < 2)
return -1;
int fd = open(argv[1], O_RDONLY|O_LARGEFILE);
struct statfs fs;
int r = fstatfs(fd, &fs);
if (r == 0)
{
printf("f_type = 0x%llx\n", (long long)fs.f_type);
}
return r == 0 ? 0 : errno;
}
@sba923 can you compile this on your rpi using gcc main.c
. That will produce an executable named a.out
.
You can pass it a filename and it will print the file system type. For example:
$ gcc main.c
$ touch /tmp/file
$ ./a.out /tmp/file
f_type = 0x1021994
What output do you get if you point it to an smb file?
I get:
FWIW, this is a Raspberry Pi 2 Model B Rev 1.1 running Raspbian "buster" (10), and the SMB server is Ubuntu 20.04.4 LTS running Samba Version 4.13.17-Ubuntu
Your OS is 32-bit. On your OS, f_type is an int32, so 0xFE534d42 is a negative number. And when represented as int64 that negative number becomes 0xFFFFFFFFFE534D42.
QED
So there are IMVHO two equivalent ways to fix the comparison of f_type
against the magic values (list e.g. in https://linux.die.net/man/2/fstatfs):
f_type
against the magic values, that's what is implemented in https://github.com/dotnet/runtime/pull/69484f_type
against the magic values cast (sign-extended) to int64
but that would basically require to change the type of the magic values in the enum.Intuitively I would think one is probably very marginally faster than the other, but 1. is simpler, that's probably why this was chosen. @adamsitnik do you concur?
(probably f_type
is 64 bits on 64-bit systems only for alignment purposes; just my guess)
BTW, when / how / why was this regression introduced?
How come 64-bit types are being used in that context on that 32-bit OS in the first place?
Just curious....
BTW, when / how / why was this regression introduced?
It's not a regression. The improvement made in https://github.com/dotnet/runtime/pull/55256 for .NET 6 has an issue on 32-bit Linux.
How come 64-bit types are being used in that context on that 32-bit OS in the first place?
.NET doesn't directly call platform functions, it does that through a native library. The native library abstracts platform differences, like using fixed-size types that are large enough to work on any platform.
It's not a regression
From the original PowerShell standpoint, it is. Writing to those network files used to work.
From the original PowerShell standpoint, it is. Writing to those network files used to work.
With what version of .NET (Core)?
Before .NET 6 the locks were taken unconditionally. So the issue should have always occurred.
I opened https://github.com/PowerShell/PowerShell/issues/15339 for PowerShell 7.1.3, which was using .NET 5.0.4.
I would have to perform some tests to determine which earlier PowerShell versions didn't exhibit the problem.
I don't have a table of PowerShell -> .NET version mapping handy, is there one @SteveL-MSFT?
I can repro with PowerShell 7.0.0, which used .NET Core 3.1.2
I can't run PowerShell 6.1.2 anymore on that system:
@SteveL-MSFT What's the best way to monitor for availability of the fix in PowerShell 7.2 LTS (based on .NET 6) and PowerShell 7.3 (based on .NET 7)?
Status: as soon as PWS 7.3.0-preview.6 is released and @sba923 confirms that the fix has solved the problem in https://github.com/PowerShell/PowerShell/issues/15339, I am going to backport this to 6.0.
This is a follow-up to https://github.com/dotnet/runtime/issues/53182
While testing PowerShell 7.2.0-rc.1 which is built atop .NET 6.0.0-rc.2 I was able to repro the issue described in https://github.com/PowerShell/PowerShell/issues/15339
Digging further, I was able to repro the underlying .NET issue, using @adamsitnik's code in https://github.com/dotnet/runtime/issues/53182#issuecomment-874101032
@SteveL-MSFT suggested I followed up here, so here I am 😋