Let #snapshotPrimitive (97) copy/backup/rename .image file first

marceltaeumel commented 3 months ago

...or is this potentially an in-image feature?

Situation: You can potentially corrupt your .image during the snapshot primitive because:

object space may be larger than disk space left for the next/current snapshot
potential snapshot bug might corrupt file
operating system might decide to reboot/crash/...

We had two cases of corrupted .image files on Windows in the last 15 months. A simple copy-before-snapshot would have prevented data loss.

codefrau commented 3 months ago

Unless there's a good reason I'd not make the primitive more complex. I don't see why this would need to be done in the VM.

jvuletich commented 3 months ago

I fully agree with Vanessa. Any such policy hardcoded in the VM will only limit the image choice on how to such backups.

OpenSmalltalk-Bot commented 3 months ago

A long, long, time ago (like 1998?) we (at Interval) did this. I can't find any code but IIRC we

saved image to temp.file
checked it worked
delete previous image file
rename temp to correct name

One might sensibly try checking if there is plenty of available room and skip the complexity when possible. Or one might create a null file with "at least enough room" (RISC OS made this especially simple) and then write into that. I'd guess that 'modern' file systems might well do annoying tricks like not actually creating the file when you think it has.

Real paranoiacs might want to to use OSProcess to start up the saved tempfile image and make sure it passes some tests before actually renaming etc.

On 2024-04-02, at 8:57 AM, Juan Vuletich via Vm-dev @.***> wrote:

I fully agree with Vanessa. Any such policy hardcoded in the VM will only limit the image choice on how to such backups.

tim

tim Rowledge; @.***; http://www.rowledge.org/tim Strange OpCodes: JTC: Jump To Conclusions

eliotmiranda commented 3 months ago

I agree with Juan & Vanessa that the backing up (eg rename image file to foo.image.bak) be done by the image, not by the snapshot primitive. There’s another thing the image should do also.

Currently the snapshot primitive does a full GC. This is somewhat of a good thing because it means that new space is always empty in a squeak snapshot and so loading is simpler. But full GCs signal objects for finalization, and this means that finalizations are done way too late, in the loading system, long after any associated state (open files et al), are up to date.

Instead, while the primitive should still GC to keep new space empty, the image should run a full GC before invoking the primitive, and not noble the primitive until finalization is complete. This ensures that

objects are not finalized twice, once in continuing from the snapshot, and once on loading/restarting the snapshot.
objects don’t get finalized, by the finalization state not being saved proper in the image (eg on save and quit)

OpenSmalltalk-Bot commented 3 months ago

On 2024-04-02, at 2:22 PM, Eliot Miranda via Vm-dev @.***> wrote:

Instead, while the primitive should still GC to keep new space empty, the image should run a full GC before invoking the primitive, and not noble the primitive until finalization is complete.

Sounds smart to me.

So it sounds to me like we should

do the "is there file space?" check and handle any problems arising
if ok, GC and deal with finalisations, file flushing, other #shutdown methods
actually write the image with whatever combination of temporary file & copy etc seems appropriate. With a note that it might be platform variant as to what the best approach is.[1][2]

[1] Apparently linux has 'fallocate' which sounds vaguely dirty, but purportedly exists just for this requirement. I was a bit nervous about googling 'man fallocate' but it did point me to https://man7.org/linux/man-pages/man2/fallocate.2.html [2] naive Windows related googling gave me https://devblogs.microsoft.com/oldnewthing/20160714-00/?p=93875

tim

tim Rowledge; @.***; http://www.rowledge.org/tim Useful random insult:- Suffers from permanent rapture of the deep. (Nitrogen narcosis.)

marceltaeumel commented 3 months ago

I agree with all suggestions. Let's see whether .image file locking might be an issue on some platforms. I suppose that only the .changes file is locked for a running image.

For limited disk space we might want to engineer a (risky) fallback later, including a way to estimate the required disk space with or without that extra backup. Since we are 64-bit these days and eventually with an incremental GC, multi-gig images could be more likely and working from a limited USB stick annoying then.

Thank you all for your thoughts!

OpenSmalltalk / opensmalltalk-vm

Let #snapshotPrimitive (97) copy/backup/rename .image file first #678

tim

tim