Oleg-N-Cher / OfrontPlus

Oberon family of languages to C translator for ARM, x64 and x86 architectures
Other
55 stars 10 forks source link

Platform.RenameFile on Unix can not move files between partitions #114

Closed kekcleader closed 2 years ago

kekcleader commented 2 years ago

Procedure RenameFile in Platform.Unix.cp cannot rename & move a file from inside the /tmp directory into another directory on Linux Mint 20.2 and 20.3 (Cinnamon) in Live CD and in some kinds of partition configuration (i.e. when /home is mounted differently in comparison with /tmp).

This can be fixed by checking for errno = EXDEV and then doing a manual file copy (using fopen, read and write). As it is known, the same issue has been previously fixed in Platform.Windows.cp.

------ same in Russian ------

Процедура RenameFile в Platform.Unix.cp не может переименовать и переместить файл из каталога /tmp в другой каталог на Linux Mint 20.2 и 20.3 (Cinnamon) в Live CD и при некоторых типах конфигурации разделов (например, когда /home смонтирован иначе, чем /tmp).

Это можно исправить, проверяя errno = EXDEV и затем выполняя копирование файла вручую (используя fopen, read и write). Известно, что эта же проблема была ранее исправлена в Platform.Windows.cp.

kekcleader commented 2 years ago

The fix:

PROCEDURE RenameFile*(IN oldname, newname: ARRAY OF CHAR): ErrorCode;
VAR res: ErrorCode;
BEGIN
  IF rename(oldname, newname) < 0 THEN RETURN err() END;
  RETURN 0
  res := rename(oldname, newname);
  IF res < 0 THEN
    res := err();
    IF res = EXDEV() THEN
      res := CopyFile(oldname, newname)
    END
  END;
  RETURN res
END RenameFile;

These internal procedures are added in the same file:

PROCEDURE CopyFileFd (from, to: FileHandle): ErrorCode;
VAR buf: ARRAY 32768 OF BYTE;
  n: INTEGER;
  err: ErrorCode;
BEGIN
  err := ReadBuf(from, buf, n);
  WHILE (err = 0) & (n > 0) DO
    err := Write(to, SYSTEM.ADR(buf), n);
    IF err = 0 THEN err := ReadBuf(from, buf, n) END
  END;
  RETURN err
END CopyFileFd;

PROCEDURE CopyFile (IN oldname, newname: ARRAY OF CHAR): ErrorCode;
VAR from, to: FileHandle;
  res: ErrorCode;
BEGIN
  from := openro(oldname);
  IF from # -1 THEN
    to := opennew(newname);
    IF to # -1 THEN
      res := CopyFileFd(from, to);
      IF closefile(to) = -1 THEN res := err() END
    ELSE res := -101
    END;
    IF closefile(from) = -1 THEN res := err() END
  ELSE res := -102
  END;
  RETURN res
END CopyFile;
rdebath commented 2 years ago

The code you've added seems to leave the old file in place when "renaming" between devices using "CopyFile".

Also if it fails part way through the destination file could be left incomplete; this does not reflect the "atomic change" guarantee of the underlying OS call.

kekcleader commented 2 years ago

@rdebath I've added the code to delete the file. Thank you for pointing that bug out.

Could you please elaborate on the atomic change guarantee of the underlying OS call? What changes are proposed?

rdebath commented 2 years ago

The rename(const char *oldpath, const char *newpath) function has this in it's documentation:

   If newpath already exists, it will  be  atomically  replaced,  so  that
   there is no point at which another process attempting to access newpath
   will find it missing.  However, there will  probably  be  a  window  in
   which both oldpath and newpath refer to the file being renamed.

and

   If newpath exists but the operation fails  for  some  reason,  rename()
   guarantees to leave an instance of newpath in place.

These are also assumed to mean that the resulting "newpath" on success or failure will also be complete, including while the transfer is happening.

This is commonly used for thing like updating configuration files where you don't want to lose it all.


Another thing I just though of is that rename() preserves permissions, acls and ownership on the file; permissions can be copied but copying ownership is impossible for a non-root user.


Thirdly, if the copy succeeds, but the delete of "oldfile" fails the "rename" has arguably succeeded as "newfile" has been updated. Likewise it's reasonable that failure of the deletion should trigger a full rollback of the copy.

jtempl commented 2 years ago

I am not sure anymore if copying a file implicitly when renaming across devices is a good idea at all. May be it is better to disallow it and offer a separate copy action that also works across devices.

rdebath @.***> schrieb am Sa., 19. Feb. 2022, 16:05:

The rename(const char oldpath, const char newpath) function has this in it's documentation:

If newpath already exists, it will be atomically replaced, so that there is no point at which another process attempting to access newpath will find it missing. However, there will probably be a window in which both oldpath and newpath refer to the file being renamed.

and

If newpath exists but the operation fails for some reason, rename() guarantees to leave an instance of newpath in place.

These are also assumed to mean that the resulting "newpath" on success or failure will also be complete, including while the transfer is happening.

This is commonly used for thing like updating configuration files where you don't want to lose it all.

Another thing I just though of is that rename() preserves permissions, acls and ownership on the file; permissions can be copied but copying ownership is impossible for a non-root user.

Thirdly, if the copy succeeds, but the delete of "oldfile" fails the "rename" has arguably succeeded as "newfile" has been updated. Likewise it's reasonable that failure of the deletion should trigger a full rollback of the copy.

— Reply to this email directly, view it on GitHub https://github.com/Oleg-N-Cher/OfrontPlus/issues/114#issuecomment-1046038276, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWTI2YQA7DQ6UR2MRMUCSLU36WUPANCNFSM5OJYJFAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

kekcleader commented 2 years ago

@jtempl We need the "copy across devices" rename procedure, because it is used when renaming a temporary file (automatically created by module Files). The temporary file is now located in a special OS-determined temp file directory. I.e. "/tmp" on *Nix, "C:\Users...\Temp" on Windows. The directory is taken from the environment variables or via system calls.

rdebath commented 2 years ago

So why are you creating your output in a temp file and then renaming it to the final location. Is it, perhaps, so that the final location never has a broken file? I.e. The "atomic replacement guarantee"

kekcleader commented 2 years ago

@rdebath Yes, that's the Oberon-way to working with files. You can create a file (and at this point you give it a name), write to it and the data eventually gets stored on disk (after the RAM buffers become full), but unless you call Files.Register, the file will not be registered in the file system. So the bytes (data) of the file are physically on disk, but there is no file system entry for that file. Something similar takes place in Linux (after you delete a file, you can still work with it, while you still have file descriptors pointing to that file).

The way it works in Oberon translators like Ofront, VOC and Ofront+ is a temporary file is being created and then it is renamed. On Windows the file must be closed before it can be renamed. And in Ofront and VOC the file is created next to the future file. In Ofront+ we made it to save the file in a special temporary directory first, not in the same directory it will be stored in.

The temp files in Ofront and VOC are called something like "..tmp._randomnumber" and these files get stuck in the directory if a program crashes, which may lead to some errors (i. e. if your program later traverses the file tree and sees those temp files). Besides, they are empty in most cases.

jtempl commented 2 years ago

If I remember correctly, BlackBox does it the other way round, i.e. New with name and Register without name. That way you know right from the beginning on which device the file will end up and you can choose a temp file on the appropriate device.

In most cases it will not make a big difference, only if the file grows very large.

In the Oberon Files module it may be possible to Register right after New and continue writing then. I don't remember if the file stays open but I think so.

Артур Ефимов @.***> schrieb am Sa., 19. Feb. 2022, 18:53:

@jtempl https://github.com/jtempl We need the "copy across devices" rename procedure, because it is used when renaming a temporary file (automatically created by module Files). The temporary file is now located in a special OS-determined temp file directory. I.e. "/tmp" on *Nix, "C:\Users...\Temp" on Windows. The directory is taken from the environment variables or via system calls.

— Reply to this email directly, view it on GitHub https://github.com/Oleg-N-Cher/OfrontPlus/issues/114#issuecomment-1046071809, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWTI2YCLGQ2YPZWP3H4B6DU37KJNANCNFSM5OJYJFAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

kekcleader commented 2 years ago

@jtempl On Ofront+ it is the same: Files.New specifies the name and Files.Register doesn't specify the name. Sorry if I confused you in my previous message, but this is what I meant.

But is there a way to to acquire a temporary directory on Windows or Linux for a specific drive? As far as I know, Windows only has an API call that returns a single Temp directory (usually on drive C:). And on Linux the standard is to use /tmp or look into a specific environment variable. Linux doesn't have the notion of "drive" at all.

If one is to write a large file, I think a good way is to call Register after a call to New. This way there will be no need to copy the file in the end. If Register is called after New, the program will work exactly as in other programming languages, i.e. C, Pascal or Go.

rdebath commented 2 years ago

acquire a temporary directory on ...

Create a hidden subdirectory in the directory you want the final file to exist. Have a lockfile+cleanup process to get rid of it when everyone is done with it. Beware different user ids. ➔ Yessssss, carefully.

Encrypted home directories would be another problem here; a virtual FS that can be over-mounted anywhere to encrypt that part of the tree. So this basically forces "this directory" or "ask the user".

Doing a "link a detached file descriptor to a filesystem location" has been suggested for Linux. There are complexities due to weird security cases which means (AFAIK) it's never stayed. (It was possible for a little while by linking /proc/self/fd/?).

Note: It's rarely used but Windows "drives" can be mounted on directories just like Unix.

I would suggest you look into how rsync deals with it's temp files as many of it's users like the idea of atomic or near-atomic updates. Specifically the options:

I would guess the "best" solution would be to have an Env variable that can be ".", ".tempdir/" or "/tmp/Ob.$USER" (same, relative or absolute) and you put temp files into those directories (relative to the final name) using a CopyFile process if RenameFile() returns EXDEV and with some automatic cleanup process of unlocked files.

kekcleader commented 2 years ago

The temp files are being cleaned up when the program exists (I think it happens as part of garbage collection), but if the program crashes, it does not do that and the files are left over. In an OS temporary directory, the files will get eventually deleted, but in an Oberon-managed temp-directory, it won't. There can be multiple instances of a compiled Oberon program (or event different programs) running simultaneously on the same machine and working with files in the same directory.

rdebath commented 2 years ago

Indeed, that's what the locking is for, so the next program to run can tell there's leftovers. It's not particularly simple, but it is possible.

jtempl commented 2 years ago

In a unix style operating system (including Windows) you can register an exit handler that would cause a garbage collection and thereby remove unused files or it could remove them directly.

Артур Ефимов @.***> schrieb am Di., 22. Feb. 2022, 15:28:

The temp files are being cleaned up when the program exists (I think it happens as part of garbage collection), but if the program crashes, it does not do that and the files are left over. In an OS temporary directory, the files will get eventually deleted, but in an Oberon-managed temp-directory, it won't. There can be multiple instances of a compiled Oberon program (or event different programs) running simultaneously on the same machine and working with files in the same directory.

— Reply to this email directly, view it on GitHub https://github.com/Oleg-N-Cher/OfrontPlus/issues/114#issuecomment-1047852664, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWTI22G7BPF35IDGPIB3M3U4OMP3ANCNFSM5OJYJFAQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

rdebath commented 2 years ago

You of course can; but that doesn't help if the program segfaults or is kill -9'd. Even calling _exit, abort or environment.fastfail will bypass the handler.