dolphinsmalltalk / Dolphin

Dolphin Smalltalk Core Image
MIT License
294 stars 58 forks source link

Image and package files become corrupted after forced shutdown #1266

Closed JBetz closed 6 months ago

JBetz commented 6 months ago

I don't know how to reproduce this, but a forced shutdown of a Dolphin image seems to have caused multiple catastrophic errors:

And by forced shutdown I mean using a task manager to kill the Dolphin task after it became unresponsive. Or at least that's what I'm guessing happened since I didn't notice the problem until a day after I had last used Dolphin. It's also possible that it was during package saving. I don't know why else certain package files would have become corrupted and not others.

JBetz commented 6 months ago

Losing the image is annoying, but package saving not being durable is catastrophically bad.

What about using a swap file rather than writing to the target package file directly? That seems to be the lowest-hanging fruit to prevent total file loss.

blairmcg commented 6 months ago

Hmmm, I think it unlikely this would have resulted from Dolphin activity, or even killing it. It sounds most likely to be disk corruption due to e.g. power loss, since you apparently have nul's/garbage in multiple files. If you'd killed it after starting to file out a package then that single package might be corrupted, but multiple packages seems unlikely. Dolphin doesn't hold open package files once loaded. Same for the image file. They are opened to write for explicit saves then immediately closed. i.e. no background writes ever occur to packages or the image file. The change log is like a transaction log, so it is held open, but writes are always appends. You might see junk at the end of the change log if the OS hadn't flushed the writes, but that would generally only lose a short duration of activity. I can't recall that ever happening to me and I regularly recover work from the change log after doing something to kill the image or put it in an state where I don't want to save it. I make regular backups of the image and change log by just copying them into a suitably timestamped backup folder. I generally don't file out source (always in pax format) until I am ready to commit the changes into git, and I then usually push the changes to github to avoid any local loss. Image/change log backups and git/github mean I don't lose work when I regularly mess up my image. I either just copy one of my most recent backups and then recover changes from the change log, or I reboot the image from my git workspace if up to date.

See also https://github.com/dolphinsmalltalk/Dolphin/blob/release/7.1/Core/Contributions/IDB/IDB%20IDE%20Package%20Backup.pax. It requires a small fix to the Package>>#savePACBackupTo: method it adds to avoid attempting to generate a backup filename with the printString of a Fraction, but other than that seems to do the job of always making a backup copy of a .pac file before overwriting it.

JBetz commented 6 months ago

Okay, that makes sense.

Some months ago I had an issue with a package file getting mangled and this looked similar. The image hung while waiting for a semaphore, and terminating the process seemed to interfere with the package file saving process. But the files weren't corrupted in that case. I can provide more details if it happens again. And the package backup extension definitely looks like it will help prevent it.

My backups include:

The first usually prevents me from losing more than a day of work. In this case, the image was in a corrupted state when it saved, so that's why it looked worse than usual. Even still, I was able to recover everything from the change log after removing the garbage data at the end.