KnpLabs / Gaufrette

PHP library that provides a filesystem abstraction layer − will be a feast for your files!
http://knplabs.github.io/Gaufrette
MIT License
2.47k stars 355 forks source link

Amazon S3: 0-byte files #232

Open teohhanhui opened 10 years ago

teohhanhui commented 10 years ago

InMemoryBuffer::open writes 0-byte content onto S3 when the stream mode is "w" or "a". This causes problems sometimes, when the subsequent write's content is overwritten by this 0-byte content (this can happen because write order is not guaranteed; see http://stackoverflow.com/questions/3184886/does-amazon-s3-guarantee-write-ordering). We've encountered this problem on an Amazon EC2 instance where bandwidth is high and latency is low to S3 (obviously).

The AWS SDK v2 for PHP seems to have a good working solution: http://docs.aws.amazon.com/aws-sdk-php/latest/class-Aws.S3.StreamWrapper.html https://github.com/aws/aws-sdk-php/blob/master/src/Aws/S3/StreamWrapper.php

Now the question is, how to integrate this nicely with Gaufrette? If successful, it will provide many other nice features such as the ability to check file size without loading the whole file (as is necessary now when using Gaufrette).

teohhanhui commented 10 years ago

@l3l0 Do you think it's proper to perform the initial truncate in buffer ($content) only? That's what I'd expect of an "InMemoryBuffer" anyway...

teohhanhui commented 10 years ago

The other way that I can think of is to ignore all 0-byte writes in all of the S3 adapters, which of course introduces its own set of peculiarities...

l3l0 commented 10 years ago

Hmm I am not sure really. First option sound ok :) Maybe @mtdowling can help us as well IIRC he added Aws adapter :)

l3l0 commented 10 years ago

@teohhanhui Can you provide some test code so I can check it with amazon ?

teohhanhui commented 10 years ago
$stream = $filesystem->createStream('test.txt');
$stream->open(new StreamMode('wb+'));
$stream->write('test');
$stream->close();

Sometimes (lower chance but quite frequently), the file's final content on Amazon S3 will be 0-byte (the empty string written when opening the stream).

l3l0 commented 10 years ago

@teohhanhui Thanks I will try to reproduce it :)

teohhanhui commented 10 years ago

The "correct" thing to do is to write to the filesystem directly when opening a new file/overwriting an existing file, but we need to somehow prevent this initial 0-byte write from going through or risk losing data.

Do you have any suggestions how to resolve this dilemma?