Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.98k stars 559 forks source link

On Windows system(1,...) redirects stdout and stderr losing O_APPEND flag #17570

Open qbolec opened 4 years ago

qbolec commented 4 years ago

Consider code which does:

open(STDOUT, ">>", "log");
system(1, "child.exe");

with the intent that child.exe should be started with stdout redirected to log file in "append mode". It turns out, that the way fopen/freopen/open/_open etc. functions are implemented in CRT, opening file for "a" (or O_APPEND) mode only stores this flag in a per process global table __pioinfo, and "emulates" the append mode by checking before each write if FAPPEND (0x20) flag is present for particular entry in this table:

    // If the file is opened for appending, seek to the end of the file.  We
    // ignore errors because the underlying file may not allow seeking.
    if (_osfile(fh) & FAPPEND)
        (void)_lseeki64_nolock(fh, 0, FILE_END);

These flags are not inherited by child processes! OTOH HANDLEs, and seek positions seem to be inherited. The end result is that the child appears to be able to write to correct file, and even at the (initially) correct position. The problem becomes only visible when there are multiple processes writing to the same file - then it really makes a difference if we write in "append" mode or just "where the parent finished", as the latter will overwrite changes written by other appending processes :( This all is relevant to Perl, because on Windows, open(STDOUT, ">>", "log") will use the mentioned low-level C File APIs, while system(1,"child.exe") will use CreateProcess.

In other words: the FAPPEND flag is held per process, and is not synchronized to the HANDLE's ACCESS_MASK.

The workaround here, is to perform several steps:

  1. check if the stdout FILE * is opened in "append": get_flags(stdout) & 0x20
  2. if so, then we need to generate new HANDLE which will have correct ACCESS_MASK (one that contains FILE_APPEND_DATA but not FILE_WRITE_DATA), we can do so with ReOpenFile(GetStdHandle(STD_OUTPUT_HANDLE), FILE_APPEND_DATA, FILE_SHARE_WRITE | FILE_SHARE_READ,0).
  3. The new handle will by default be non-inheritable, so we have to set SetHandleInformation(h,HANDLE_FLAG_INHERIT,HANDLE_FLAG_INHERIT);
  4. And finally we can use either SetStdHandle(STD_OUTPUT_HANDLE,h), or pass it directly to si.hStdOutput used for CreateProcess.

The demo consists of three elements:

  1. child.cpp (compile with cl.exe child.cpp) - which is a simple program which writes a line to stdout every second - this is supposed to be the child.exe spawned by parent 2a. parent.pl - this is the same example as in the first paragraph 2b. parent.cpp - (compile with cl.exe parent.cpp) which is intended to mimic what Perl has to execute when it executes parent.pl that is it first does freopen("log","a",stdout) and then calls CreateProcess. It contains a lot of debug code used to demonstrate what happens
  2. append.pl - a simple perl script which tries to append lines to log file every second, so it plays the role of "other process trying to append in parallel to child.exe"

To demonstrate the bug in Perl (I'm using Strawberry perl 5, version 30, subversion 0 (v5.30.0) built for MSWin32-x64-multi-thread) you run perl parent.pl in one window and perl append.pl in another, and do cat log several times to see how the child.exe spawned by parent.pl overwrites output from append.pl.

To analyze what's happening, you can use parent.cpp, to generate parent.exe which mimics the parent.pl and provies a lot of debug data. So, you run parent.exe and perl append.pl in parallel and do cat log to monitor the output. (Do not use tail -f log, because this will give you crazy results if one of the processes is not using append mode).

In the parent.exe's console you should see something like:

Before reopen:
GetStdHandle(STD_OUTPUT_HANDLE) = 000000000000044C inheritable nonappend
_get_osfhandle(_fileno(stdout)) = 000000000000044C inheritable nonappend
flags(stdout) = 1 + 8 + 80
After reopen:
GetStdHandle(STD_OUTPUT_HANDLE) = 000000000000007C inheritable nonappend
_get_osfhandle(_fileno(stdout)) = 000000000000007C inheritable nonappend
flags(stdout) = 1 + 20 + 80
After ReOpenFile:
GetStdHandle(STD_OUTPUT_HANDLE) = 0000000000000080 noninheritable append
_get_osfhandle(_fileno(stdout)) = 000000000000007C inheritable nonappend
flags(stdout) = 1 + 20 + 80
After SetHandleInformation:
GetStdHandle(STD_OUTPUT_HANDLE) = 0000000000000080 inheritable append
_get_osfhandle(_fileno(stdout)) = 000000000000007C inheritable nonappend
flags(stdout) = 1 + 20 + 80
 si.hStdInput = 0000000000000448
 si.hStdOutput = 0000000000000080
 si.hStdError = 000000000000044C
Created child.exe process 1

and in the log file you should see interleaved lines from parent, child, and append.pl, without any numbers missing. Some things to note here:

If you remove all the lines in the #ifndef OLD section, then you will get a behaviour very similar in spirit to Perl5 behavior. And the resulting log file will look something like this (after 20-30 seconds)

Spawn 0
Spawn 1
Spawn 2
Spawn 3
Spawn 4
Spawn 5
Spawn 6
Hello 1 from perl
Spawn 7
Hello 2 from perl
Spawn 8
Hello 3 from perl
Spawn 9
Hello 4 from perl
Spawn 10
Hello 5 from perl
Spawn 11
Hello 6 from perl
Spawn 12
Hello 7 from perl
Spawn 13
Hello 8 from perl
Spawn 14
Hello 9 from perl
Spawn 15
Hello 10 from perl
Spawn 16
Hello 11 from perl
Spawn 17
Hello 12 from perl
Spawn 18
Hello 13 from perl
Spawn 19
Iteration 0
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
llo 18 from perl
Hello 19 from perl

That is, as long as it is the parent.exe who does writing, then it is correctly appending to the file, and the lines from parent.exe and append.pl are interleaved without any missing. But as soon as the child starts, then it will "overwrite" everything in the file from the place parent started - in particular it overwrote lines with "Hello 14 from perl" til the beggining "He" of "Hello 18 from perl". Note that this is consistent with the parent having FAPPEND flag set in its global table, and the child being completely unaware requirement to use FILE_APPEND_DATA .

child.cpp

#include<cstdio>
#include<thread>
#include<chrono>
int main(){
  for(int i=0;i<100;++i){
          printf("Iteration %d\n",i);
          fflush(stdout);
          std::this_thread::sleep_for(std::chrono::milliseconds(1000));
  }
  return 0;
}

append.pl

for my $i (1..100) {
  open(FILE, ">>", "./log") or print "Sorry couldn't open for append $i\n";
  print FILE "Hello $i from perl\n" or print "Sorry couldn't print $i\n";
  close FILE;
  sleep 1;
}

parent.pl

open(STDOUT, ">>", "log");
my $pid = system(1, "child.exe");
print "spawned $pid\n";
sleep 200;

parent.cpp

#include <windows.h>
#include <iostream>
#include<thread>
#include<chrono>
#include<io.h>
#include<stdio.h>
#include <winternl.h>

/* Copy&pasted:
https://social.msdn.microsoft.com/Forums/en-US/58b618ef-f714-4003-aa8c-e754fef08e63/how-to-determine-a-cfile-object-is-writable-or-not?forum=vcgeneral
*/
typedef struct _FILE_ACCESS_INFORMATION {
  ACCESS_MASK AccessFlags;
} FILE_ACCESS_INFORMATION, *PFILE_ACCESS_INFORMATION;

const int FileAccessInformation = 8;  //Needed because FILE_INFORMATION_CLASS in winternl.h is incomplete

typedef NTSTATUS (NTAPI * PFN_NTQUERYINFORMATIONFILE)( HANDLE, PIO_STATUS_BLOCK,
PVOID, ULONG, FILE_INFORMATION_CLASS );

/* Copy&pasted from:
   /c/Program Files (x86)/Windows Kits/10/Source/10.0.17763.0/ucrt/inc/corecrt_internal_lowio.h
*/

enum class __crt_lowio_text_mode : char
{
    ansi    = 0, // Regular text
    utf8    = 1, // UTF-8 encoded
    utf16le = 2, // UTF-16LE encoded
};
typedef char __crt_lowio_pipe_lookahead[3];
#define IOINFO_ARRAYS       128
struct __crt_lowio_handle_data
{
    CRITICAL_SECTION           lock;
    intptr_t                   osfhnd;          // underlying OS file HANDLE
    __int64                    startpos;        // File position that matches buffer start
    unsigned char              osfile;          // Attributes of file (e.g., open in text mode?)
    __crt_lowio_text_mode      textmode;
    __crt_lowio_pipe_lookahead _pipe_lookahead;

    uint8_t unicode          : 1; // Was the file opened as unicode?
    uint8_t utf8translations : 1; // Buffer contains translations other than CRLF
    uint8_t dbcsBufferUsed   : 1; // Is the dbcsBuffer in use?
    // this depends on version of visual studio
    // 10.0.10240.0 has just one char
    // 10.0.17763.0 has MB_LEN_MAX chars
    char    dbcsBuffer[MB_LEN_MAX];           // Buffer for the lead byte of DBCS when converting from DBCS to Unicode
};
typedef __crt_lowio_handle_data* __crt_lowio_handle_data_array[IOINFO_ARRAYS];
extern "C"  __crt_lowio_handle_data_array __pioinfo;
#define IOINFO_L2E          6
#define IOINFO_ARRAY_ELTS   (1 << IOINFO_L2E)
#define _pioinfo(i)          (__pioinfo[(i) >> IOINFO_L2E] + ((i) & (IOINFO_ARRAY_ELTS - 1)))
#define _osfile(i)           (_pioinfo(i)->osfile)

bool inheritable(HANDLE h){
  DWORD flags;
  if(!GetHandleInformation(h, &flags)){
    std::cerr << "Could not GetHandleInformation for " << h << std::endl;
    exit(1);
  }
  return flags&HANDLE_FLAG_INHERIT;
}
unsigned char get_flags(FILE *f){
  return (_osfile(_fileno(f)));
}
template<typename S,typename F>
void show_flags(S &s,F flags){
  bool first=true;
  for(uint64_t f=1;f<=flags;f=f*2){
    if(f&flags){
      if(!first){
        s << " + ";
      }
      s << std::hex << f << std::dec;
      first=false;
    }
  }
}

bool is_append_mode(HANDLE h){
  HINSTANCE hNtDll = LoadLibrary(TEXT("ntdll.dll"));
  PFN_NTQUERYINFORMATIONFILE NtQueryInformationFile =
    (PFN_NTQUERYINFORMATIONFILE)GetProcAddress(hNtDll,
    "NtQueryInformationFile");
  FILE_ACCESS_INFORMATION fai{};
  IO_STATUS_BLOCK io_status{};
  auto err = NtQueryInformationFile(h,&io_status,&fai,sizeof(fai),(FILE_INFORMATION_CLASS)FileAccessInformation);
  if(err){
    std::cerr << "NtQueryInformationFile for " << h << " has returned " << err << std::endl;
    exit(1);
  }
  /*
  std::cerr << "AccessFlags are ";
  show_flags(std::cerr, fai.AccessFlags);
  std::cerr << std::endl;
  */
  return (fai.AccessFlags & FILE_APPEND_DATA) && !(fai.AccessFlags & FILE_WRITE_DATA);
}

template<typename S>
void describe_handle(S & s,HANDLE h){
  s << h << (inheritable(h)?" inheritable":" noninheritable") << (is_append_mode(h)?" append":" nonappend");
}
template<typename S>
void describe_std_output(S & s){
  HANDLE h1 = GetStdHandle(STD_OUTPUT_HANDLE);
  HANDLE h2 = (HANDLE)_get_osfhandle(_fileno(stdout));
  s << "GetStdHandle(STD_OUTPUT_HANDLE) = "; describe_handle(s, h1);
  s << std::endl << "_get_osfhandle(_fileno(stdout)) = "; describe_handle(s,h2);
  s << std::endl << "flags(stdout) = ";
  show_flags(s, get_flags(stdout));
  s << std::endl;
}
int main(){
    std::cerr << "Before reopen:" << std::endl;
    describe_std_output(std::cerr);
    freopen("log","a",stdout);
    std::cerr << "After reopen:" << std::endl;
    describe_std_output(std::cerr);
    for(int i=0;i<20;++i){
          printf("Spawn %d\n",i);
          fflush(stdout);
          std::this_thread::sleep_for(std::chrono::milliseconds(1000));
    }

#ifndef OLD
    HANDLE h = ReOpenFile(GetStdHandle(STD_OUTPUT_HANDLE), FILE_APPEND_DATA, FILE_SHARE_WRITE | FILE_SHARE_READ,0);
    SetStdHandle(STD_OUTPUT_HANDLE,h);
    std::cerr << "After ReOpenFile:" << std::endl;
    describe_std_output(std::cerr);

    SetHandleInformation(h,HANDLE_FLAG_INHERIT,HANDLE_FLAG_INHERIT);
    //SetStdHandle(STD_OUTPUT_HANDLE,h);
    std::cerr << "After SetHandleInformation:" << std::endl;
    describe_std_output(std::cerr);
#endif

    PROCESS_INFORMATION pi;
    STARTUPINFO si;
    BOOL ret = FALSE;
    DWORD flags = 0;//CREATE_NO_WINDOW;

    ZeroMemory( &pi, sizeof(PROCESS_INFORMATION) );
    ZeroMemory( &si, sizeof(STARTUPINFO) );
    si.cb = sizeof(STARTUPINFO);
    si.dwFlags= STARTF_USESTDHANDLES;
    si.dwFlags= STARTF_USESTDHANDLES | CREATE_NEW_CONSOLE;
    si.hStdInput= GetStdHandle(STD_INPUT_HANDLE);
    si.hStdOutput= GetStdHandle(STD_OUTPUT_HANDLE);
    si.hStdError= GetStdHandle(STD_ERROR_HANDLE);

    std::cerr << " si.hStdInput = " <<  si.hStdInput << std::endl;
    std::cerr << " si.hStdOutput = " <<  si.hStdOutput << std::endl;
    std::cerr << " si.hStdError = " <<  si.hStdError << std::endl;

    TCHAR cmd[]= TEXT("child.exe");
    ret = CreateProcess(NULL, cmd, NULL, NULL, TRUE, flags, NULL, NULL, &si, &pi);

    if ( ret )
    {

        std::cerr << "Created child.exe process " << ret << std::endl;
        CloseHandle(pi.hProcess);
        CloseHandle(pi.hThread);
        return 0;
    }

    return -1;
}

I'd like to thank @MB-- who has figured out much of all this, and helped me put it together, when he was working on a fix for MySQL's mysql-test-run.pl script which appends concurrently to the mysqld's server error log spawned from it using system(1,..).

Some materials I found extremely helpful:

tonycoz commented 4 years ago

Have you reported this to Microsoft?

I suspect the fix, if any, at our end would be to reimplement open(), converting append mode to FILE_ACCESS_OPEN rather than WRITE.