Alcaro / Flips

Floating IPS is a patcher for IPS and BPS files.
Other
337 stars 46 forks source link

On error, do you always send EOF to the output? #21

Closed i30817 closed 5 years ago

i30817 commented 5 years ago

Remember me, i was trying to use your app on a project? I decided to go with processbuilder and named pipes (to make it possible to patch roms i want to checkout the final checksums without blowing up my memory).

This works fine, but i found out a error case i'm not sure how to fix myself though i'm sure it's possible with select or something non-blocking like that. But why suffer if you don't have to? ;)

I have these 3 functions:

 @contextmanager
 def named_pipes(n=1):
     dirname = tempfile.mkdtemp()
     try:
         paths = [os.path.join(dirname, 'romhack_pipe' + str(i)) for i in range(n)]
         for path in paths:
             os.mkfifo(path)

         yield paths
     finally:
         shutil.rmtree(dirname)

 def get_checksums():
     hash_md5   = hashlib.md5()
     hash_sha1  = hashlib.sha1()
     hash_crc32 = 0
     size = 0

     buf = yield

     while len(buf) > 0:
         size += len(buf)
         hash_md5.update(buf)
         hash_sha1.update(buf)
         hash_crc32 = zlib.crc32(buf, hash_crc32)
         buf = yield

     crc = '{:08x}'.format( hash_crc32 & 0xffffffff )
     md5 = hash_md5.hexdigest()
     sha1 = hash_sha1.hexdigest()

     yield (size, crc, md5, sha1)

 def producer(arguments, generator_function):
     ''' will append a output fifo to the end of the argument list prior to
         applying the generator function to that fifo. Make sure the command
         output is setup for the last argument to be a output file in the arguments
     '''
     BLOCKSIZE = 2 ** 20
     process = None
     next(generator_function)
     with named_pipes() as pipes:
         pipe = pipes[0]
         arguments.append(pipe)
         with subprocess.Popen(arguments,
                 stdout=subprocess.DEVNULL,
                 stderr=subprocess.DEVNULL) as process:
             with open(pipe, 'rb') as fifo:
                 byt = fifo.read(BLOCKSIZE)
                 while len(byt) > 0:
                     generator_function.send(byt)
                     byt = fifo.read(BLOCKSIZE)
     #if a error occurred avoid writing bogus checksums
     if process.returncode != 0:
         raise NonFatalError('error during patching, try to remove the header if a snes rom')

     return generator_function.send([])

Anyway the idea is the producer uses namedpipes to create a pipe, then it passes the pipe write end to the flips (and xdelta) subprocesses as the last argument of a list of arguments, and the reader end keeps passing the bytes to a continuation (in many cases that middle function there), that produces the checksums from the bytes one step at a time.

Only in a defective patch, it appears that flips never opened the file output for writing so the named pipe open is blocked forever.

The message in 'normal' cmd line operation is Unknown patch format.. Could you do me the favor of always opening and closing (EOF sent) the output file in case of error? I think it's just a question of opening the given file for writing on startup and on a process end, it'll be closed.

These command line interfaces should have standardized handing out open files imo, but that's fantasies for when we refactor the world.

Alcaro commented 5 years ago

Yes, that name looks familiar, though I don't recognize you using Python. I think it was C#?

No, I'm afraid I cannot grant that favor. While it would improve your usecase, it doesn't make any sense for normal files; if the patch is broken, there is no output, not even an empty file.

I could create, then delete on failure, but that's unhelpful if the output existed already (for example if Flips is asked to overwrite its input).

I could create, then delete on failure only if it didn't exist, but hardcoding a list of special cases just guarantees I'll miss a few - and guarantees that some users will have different expectations from me. It's better to be as straightforward as reasonably possible, so users can confidently predict Flips' behavior.

Instead, you'll have to fix this on your end. The easiest solution should be using a normal file instead, or using /proc/self/fd/2 as output filename and reading your file from the child process' stderr. (Don't use stdout, that's where Flips' status messages go.)

If your goal is to not buffer the output file in memory, you've failed already - Flips does that internally, whether you like it or not, since IPS and BPS application absolutely demands the ability to seek around in the output (the UPS format permits streaming, but it's easier this way, I'm lazy). I could mmap a disk file instead of malloc, but I never bothered implementing that (and you can't mmap a pipe, anyways). Feel free to use ulimit if you're worried about Flips wasting too much RAM if given a weird patch.

i30817 commented 5 years ago

It's not only flips but xdelta too, and i said 'memory' i (mostly, though it also helps if the other process is streaming) meant 'disk memory' fatigue. Which is very important when you're reading hundreds of gb to calculate checksum en-masse, the last thing you want is to be writing to disk unnecessarily at the same time.

And the file exists already, that's what a fifo is; a special file created and passed. I was suggesting only 'if it exists, open and close it once before quitting don't even write'. I think that'll work to tell the other end they're done because it'd block on close until the reader came along and close would send '0 bytes' 'written' and the reader read that? I'm unsure how it works but i think 'EOF' is not a real write to the physical file, not that it matters in this case.

Alcaro commented 5 years ago

That too counts as hardcoding a list of special cases. A shorter list than deleting on failure, but still completely unexpected to everyone else.

If you're more worried about disk (or disk bandwidth) use than peak RAM use, my recommendation remains to use /proc/self/fd/2. I can see a few other options, but none with any real advantages over fd/2.

i30817 commented 5 years ago

That too counts as hardcoding a list of special cases. A shorter list than deleting on failure, but still completely unexpected to everyone else.

Who really is the 'someone else?' If it's coders a comment on the exit path 'this open and close if the output file exists if we haven't got it open is in case the user supplied a named fifo if nothing was written on error'.

I'm not asking for this for all errors (though it may be easier to code it like that) but those that aren't the user's fault (misused flags), but patch based (corrupt patch).

If it's users, users won't notice it because it's just a open and close of a existing file on a failure (thus unlikely) case.

It seems very straightforward, and fifo is a portable interface (to windows too, in python), unlike proc.

Alcaro commented 5 years ago

Yes, pipes are good in many contexts, including this. I'm not proposing you use something other than pipes.

What I am proposing is that you use an anonymous pipe instead of a named one. Anonymous pipes can't be in that weird half-open state.

If you don't like the name /proc/self/fd/2, feel free to use /dev/stderr instead. Actually, the latter is the better name; they're the same on Linux, but the latter is more likely to exist on other Unix-likes, and it's easier to remember. I've got a strange habit of memorizing all the wrong stuff...

Even if Flips did open the file the way you want, I believe stderr=subprocess.PIPE and /dev/stderr would be cleaner than your current solution. No need to think of a name, no need to consider what happens if that name's taken, no need to wonder who's gonna clean it up when you're done.

If you think named pipes have some advantage over anonymous, please explain. If you can convince me that named pipes are the better choice in this context, I'll do as you wish.

i30817 commented 5 years ago

I just found out i was mistaken and os.mkfifo is not available in windows in python, so even if you helped out with this it still wouldn't run there, embarrassing.

Anyway, I'm very confused I don't understand how can i replace the output argument without a named pipe, because there is no way i know to send the filehandle to the program option it expects (last position of a 3 tuple of patch rom output). I mean, sure i can read the stdout or err of the process... but i want the bytes it's writing to the 'file', and i don't want to clobber my actual rom while i'm at it. Maybe there is a weird wait to tie in a 'file' to the lifetime of the writer process ( flips in this case ) and still provide it as a argument before it's 'started' but if so, i have no clue what it is, because i thought the anonymous pipes just write the stdout, and i have no way to make it do that transformation from python Popen.

Or are you suggesting there is a flag to write the bytes to stderr or similar?

Alcaro commented 5 years ago

As I said (or tried to say, maybe I was unclear), use /dev/stderr as output filename. Works fine on nearly all programs, including Flips.

Unfortunately, there is no equivalent on Windows. There are a couple of magic filenames (CON, NUL, AUX, CONOUT$, etc), but none of them point to 'whereever stderr is'. Windows is chock full of half-baked orthogonal concepts that would be way more useful if they interacted a bit more...

I believe your best bet on Windows is one of the magic non-disk filesystems starting with \\.\, \\?\ or \??\. Unfortunately, I was unable to locate a comprehensive list of which of those exist; google isn't very cooperative with anything consisting entirely of special characters. Perhaps the best solution is a named pipe, though it wouldn't surprise me if Windows pipes behave subtly or wildly differently from Linux...

i30817 commented 5 years ago

This can be closed then, your solution works better. Frustrating about windows and the very fact there isn't a library in pypi that uses that '/dev/stderr or CON' to abstract this over both windows and linux tells me that it doesn't work, and i don't have windows to test anymore so i'll just remove support for windows.