Servers corrupt files again

RealDolos commented 9 years ago

I've seen quite a few uploads recently being corrupted by many different uploaders. File sizes vary. Since TLS protects against regular in-transit data corruption, it seems more than likely that the upload end point corrupts the data, or even worse a hardware failure.

@laino pls fix

Pls also make available some kind of checksum for uploads via some API. You're md5'ing the files anyway for the blacklist feature. Providing that hash would suffice.

RealDolos commented 9 years ago

To quote the movies room:

Adonis:so many corrupted files. i want them all

Robertcop commented 9 years ago

i too have seen MY OWN uploads ruined and CRC MD5 mismtach. i will upload via volaupload and then download a few. all of them do not match the md5. whether its a video file or an archived file! come on lain dont you see? something is really wrong here. please do something man!

laino commented 9 years ago

I suspect the upload resuming being broken actually. How does volaupload handle it?

RealDolos commented 9 years ago

@laino It is kinda broken in the browser FWIK, i.e. there are lots of people complaining about resets. The resumption stuff of volaupload.py mostly works, for me at least, but sometimes the file gets corrupted. I seen a bunch of corrupted files uploaded by people who are clearly not using volaupload.py, and some are even small enough (only a couple of MB) that I'm not even entirely sure if those were resume-uploaded at all. The volaupload.py code is actually volapi code and is here: https://github.com/RealDolos/volapi/blob/master/volapi/volapi.py#L673-L710

RealDolos commented 9 years ago

@laino, post upload resume code xD

inb4 open source is bad.

laino commented 9 years ago

@RealDolos the code handling uploads in Volafile is thousands of lines, but resuming is a well-tested feature.

Does it only happen on specific servers?

RealDolos commented 9 years ago

@laino Well tested in "it never really works"? There have been countless people complaining that resuming will often just reset the download and stuff like that, although the corruption issues seem more recent. I've seen the resets some few times myself, where the uploadState endpoint would simply not provide a receivedBytes mid-download.

I think I've seen it on both servers, but recently it was dl4 most of the time/all of the time. It is actually somewhat hard to tell when it happens since I do not usually download and re-check my own uploads.

If only one server was affected I'd guess hardware failure. But this way, it seems more likely that the volafile upload code is to blame. Since volaupload works most of the time, and since this happens with regular in-browser downloads as well, I think volafile is to blame.

laino commented 9 years ago

If only one server was affected I'd guess hardware failure. But this way, it seems more likely that the volafile upload code is to blame.

More likely it's timeouts. If the client takes to long to ask for upload status, the server will have removed the partial file already.

RealDolos commented 9 years ago

I was talking about the corruption issues. The timeout issues, maybe, but not always it seems. But I don't really care about the timeout issues anyway.

RealDolos commented 9 years ago

Adonis in BEEPi just provided corrupted uploads (by some n00b user, so probably in-browser uploads) that show dl3 and dl4 have corrupted files.

RealDolos commented 9 years ago

Here is a pic of files uploaded to (by volanoob in TV) then downloaded from volafile and then checked against some torrent checksums, courtesy of Adonis. Massive corruption is obvious, even despite the torrent program tentatively marking whole blocks as mismatched when only a few bits or bytes were corrupted in that block.

removed

RealDolos commented 9 years ago

I uploaded something, which got corrupted, so I diffed the files. I looked at where the file got first corrupted and then looked data in that location: The data at offset 643383129 in the original file is present in the corrupted file at 643383129 as it should, but again repeated from 643514201 onwards, as it should not. If you diff that you get 643514201 - 643383129 = 131072 or (1<<17) or 128KiB, which does not look much like a coincidence.

So it seems that when resuming the file on the server gets a seek 128KiB past where it is supposed to be resumed sometimes. This does not happen always, tho. Could be some kind of race? Or stale data written too late?

I checked with one of Oneroi's corrupt files in TV and same thing: Data is repeated after 131072 aka 128KiB.

RealDolos commented 9 years ago

To give a shitty and confusing example, let's assume this is a file of multiple virtual 128KiB blocks. It would normally look like this: ÁBCDEFGHIJK When corrupted it will look something like this: ABCDDEFGIJK It still has the same size, but D is there twice, moving DEFG 128KiB out of place, eliminating H somehow.

I would think something like this happens:

connection: ABC (or ABCD) and connection reset after 120s
connection: DDEFGH (or DEFGH) and connection reset after 120s
connection: IJK, where I as at the correct offset again, overwriting previous H

laino commented 9 years ago

Thank you. It'll give me something to have a look at.

RealDolos commented 9 years ago

made some small changes, pls tell whether it fixed file corruptions

nice, @laino . Still, can I haz md5 for an upload from some API so I can easily check it against source md5 and by that detect corruptions without actually downloading files? Gib md5 (or whatever checksum) pls

laino commented 9 years ago

I'll probably exposes some checksums via an API endpoint soon

RealDolos commented 8 years ago

@laino new corrupt file(s) in PCT3eD at least

Robertcop commented 8 years ago

@laino files are once again getting corrupted lain. your patch job or whatever wasn't enough. please help us obeonelaino. you're our only hope.

Robertcop commented 8 years ago

both in [removed] and [removed] not sure if it's server wide.

laino commented 8 years ago

I made some changes, let's see if it's still happening

MyPrivateFilesGoHere commented 8 years ago

files are still corrupted

Robertcop commented 8 years ago

lain help us :-1:

laino commented 8 years ago

fix'd

Robertcop commented 8 years ago

this is not fixed. time to reopen

Robertcop commented 8 years ago

[removed] has hash f12e50eebfadca6492382738de7c0e55 [removed] has hash f24df1a733a8c5fa6b228f304a7ff940

more proof later maybe!:-1::-1::-1::-1::-1::-1:

laino commented 7 years ago

Should be resolved with new download code. Also checksums can now be retrieved.

Volafile / volafile-bugs

Servers corrupt files again #129