RsyncProject / rsync

An open source utility that provides fast incremental file transfer. It also has useful features for backup and restore operations among many other use cases.
https://rsync.samba.org
Other
2.95k stars 338 forks source link

Assertion failures and corrupted size every so often #398

Open Gert-dev opened 2 years ago

Gert-dev commented 2 years ago

Problem

Every so often when I use rsync to sync, I get corruption or assertion errors causing the transfer to abort. Usually just restarting the same command fixes it. It seems to occur at random, but about once out of four times.

Steps To Reproduce

I'm not sure the command arguments are related to the problem, but I'll list the one I use for completeness:

  1. rsync -a -v --progress -L --update --delete --force /home/user/Documents/ "user@server:/mnt/ExternalHD/Documents" on client
  2. Roll the dice - sometimes it works, sometimes the "corrupted size" error occurs, sometimes an assertion fails (see below).

Expected Result / Possible Solutions

The transfer always works.

Additional Info

The server is on my local, stable, and high-throughput, network. The server itself is a rather slow device, though (Cubox with a Marvell CPU), but should be able to manage over 100 Mbit/s. It's hosted on a flash card, but the path it transfers to is an external HDD connected over USB.

I've used rsync often in other configurations and never encountered these problems, and am not familiar with rsync's internals, but perhaps this has something to do with client and server being of different architectures (ARM and x86-64) and the binary protocol they need to speak being off somehow?

Full Output

First

sending incremental file list
corrupted size vs. prev_size
rsync: connection unexpectedly closed (386766 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(228) [generator=3.2.3]
rsync: connection unexpectedly closed (2438 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(231) [sender=3.2.7]

Second

sending incremental file list
rsync: malloc.c:2379: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
rsync: connection unexpectedly closed (272864 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(228) [generator=3.2.3]
rsync: connection unexpectedly closed (1625 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(231) [sender=3.2.7]
WayneD commented 2 years ago

Those are assertions from another program or library, which looks to be buggy. In the first case both sides of the connection complained about an unexpectedly closed socket, which seems to indicate that either your remote shell failed or the error is from a memory library. In the second error, the memory library is complaining, but again, both sides got a failed socket, which means that one of the rsync's didn't close the connection, something else did. In prior cases similar to this someone had hit a kernel bug in the networking library. There is also a possibility that rsync is corrupting memory somewhere, but I've run rsync under valgrind and not seen any errors. I haven't had the ability to try a valgrind run on an arm system yet, though.

realsimix commented 2 years ago

Apart from buggy libraries: are you sure the hardware is fine? Corruption in memory or on the ethernet wire could also cause such errors AFAIK.

Gert-dev commented 1 year ago

Those are assertions from another program or library, which looks to be buggy. In the first case both sides of the connection complained about an unexpectedly closed socket, which seems to indicate that either your remote shell failed or the error is from a memory library. In the second error, the memory library is complaining, but again, both sides got a failed socket, which means that one of the rsync's didn't close the connection, something else did. In prior cases similar to this someone had hit a kernel bug in the networking library.

Apart from buggy libraries: are you sure the hardware is fine? Corruption in memory or on the ethernet wire could also cause such errors AFAIK.

Well... I guess as I'm as sure as I can reasonably state :sweat_smile: . As mentioned it's a rather slow device, so maybe time-outs are being hit here and there when a packet doesn't arrive as soon as some part of client code expects. Hwoever, I have no noticeable errors appearing in system logs, nor any strange behaviour around connections from other applications being interrupted, such as during file transfers or SSH sessions randomly being closed or hanging, not even when they are active at the same time this happens.

EDIT: I'm of course open to testing certain things if any one believes this may explain these issues as originating from the hardware itself.