Supervisor / supervisor

Supervisor process control system for Unix (supervisord)
http://supervisord.org
Other
8.38k stars 1.24k forks source link

Eventlistener protocol "len" header changed in Supervisor 4 #1243

Open mnaberez opened 5 years ago

mnaberez commented 5 years ago

@plockaby wrote comments of issue #1235:

Migrating from version 3 to version 4, my event listeners read from supervisor over stdin using a byte stream and then did the conversion to utf8 and chose to "backslashreplace" undecodable utf8 data. When I went to version 4 I expected to continue to get a byte stream that I could convert myself to whatever encoding I knew the data to be in. It was really unexpected to discover that supervisor was doing the conversion for me and then telling me the number of characters to read off of stdin and not the number of bytes.

cc: @vsajip

mnaberez commented 5 years ago

Description eventlistener header tokens:

len | An integer indicating the number of bytes in the event payload, aka the PAYLOAD_LENGTH

From above, it sounds like the meaning of the len header unintentionally changed from bytes to characters. This is something we should probably investigate. There are a lot of existing eventlisteners.

plockaby commented 5 years ago

Thanks for making this. I put my issue over there because it looked related to doing byte->string conversion in an unexpected place to me but you're right, I think, that the recipient of the converted data is different.

vsajip commented 5 years ago

According to the docs,

In the below list, we indicate that some event types have a “body” which is a a token set. A token set consists of a set of charaters with space-separated tokens. Each token represents a key-value pair. The key and value are separated by a colon.

It's not clear to me why you would be seeing arbitrary (not utf-8 decodable) binary data in the event listener. Which event types are being listening for where the problem is seen?

plockaby commented 5 years ago

Also according to the docs:

len | An integer indicating the number of bytes in the event payload, aka the PAYLOAD_LENGTH

And:

Directly following the linefeed character in the header is the event payload. It consists of PAYLOAD_LENGTH bytes representing a serialization of the event data. See Event Types for the specific event data serialization definitions.

And:

Once it has processed the header, the event listener implementation should read PAYLOAD_LENGTH bytes from its stdin, perform an arbitrary action based on the values in the header and the data parsed out of the serialization.

When Supervisor3 was running under Python2 the payload length did indeed represent the number of bytes to read. In Supervisor4 the payload length now represents characters. My event listener was listening for bytes under Supervisor3 and when I upgraded to Supervisor4 my event listener stopped working because the stream now came as characters and not bytes.

plockaby commented 5 years ago

Sorry I misspoke in the last sentence. The stream might have always come as characters and not bytes (not sure tbh) but in Supervisor3 under Python2 that didn't matter because Python didn't really distinguish between bytes and characters very effectively and the payload length represented bytes to read. In Supervisor4 under Python3 the payload length definitely changed to represent characters. So the docs are wrong in the sense that it says bytes when it is actually characters.

I'd actually argue that it'd be better if the supervisor event listener protocol changes to NOT do utf8 conversions and just gives event listeners raw data so that I can decide what the encoding is in my event listener rather than get messages that literally say "indecipherable".

plockaby commented 5 years ago

And since you asked but I have yet to answer this is happening on log events. Sorry I didn't say that sooner and for replying to this thread so many times.

vsajip commented 5 years ago

Well, the bytes vs. chars thing in the documentation could be addressed just by changing the documentation to say "characters" rather than "bytes". I'm not sure it was really intended to mean bytes, specifically, since in Python 2.x the distinction was somewhat loose. Note that most of the output of events (input to event listeners) is actually logically text rather than binary.

For programs interacting with Supervisor, it would seem that their stdout and stderr should logically be text streams - are you saying they are not? TBF, the output in the "indecipherable" case is Undecodable: followed by the binary data in repr format, which certainly allows it to be recovered.

I'll leave it to Mike to decide how this should be handled, from a policy point of view. I'm not sure what's best at this point, but I would want to see how many people are running into problems with arbitrary binary data in log streams (to understand the use cases better) before deciding what the best fix is.

plockaby commented 5 years ago

If the documentation is changed you should also change the upgrade guide for 3->4 to indicate a breaking change. As I'm sure you know, sys.stdin and sys.stdout in Python3 are byte streams and not text streams. I had to change my code from this:

stdin = sys.stdin.buffer

to this:

stdin = io.TextIOWrapper(sys.stdin.buffer, encoding="utf-8", newline='\n')

in order to make .read(PAYLOAD_LENGTH) return the correct data since without that change .read would read bytes and not characters and when my log lines contained UTF-8 data the number of bytes did not match the number of characters.

I guess what I'm saying is that this policy is backwards incompatible and should either change to be backwards compatible or the docs should be updated to say "hey we made this breaking change in the migration from 3 to 4." But what I'd strongly prefer is that the documentation and functionality from version 3 stay the same. I say that because (1) it's what the docs say, (2) it's probably how a lotof event listeners were implemented, and (3) it's a better interface to let the receiver of the event listener decide what encoding their log lines are in rather than deciding for them.

I appreciate you taking the time to understand the issue that I'm facing.

vsajip commented 5 years ago

As I'm sure you know, sys.stdin and sys.stdout in Python3 are byte streams and not text streams.

Which version of Python 3 are you running?

$ python3
Python 3.7.3+ (heads/3.7:0ff08b061b, May 11 2019, 09:27:05) 
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdin
<_io.TextIOWrapper name='<stdin>' mode='r' encoding='UTF-8'>
>>> sys.stdout
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
>>> sys.stderr
<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>

$ python3.6
Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdin
<_io.TextIOWrapper name='<stdin>' mode='r' encoding='UTF-8'>
>>> sys.stdout
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
>>> sys.stderr
<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>

$ python3.5
Python 3.5.7+ (heads/3.5:2bb327816d, May 11 2019, 08:47:36) 
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdin
<_io.TextIOWrapper name='<stdin>' mode='r' encoding='UTF-8'>
>>> sys.stdout
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
>>> sys.stderr
<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>

$ python3.4
Python 3.4.10 (default, May 11 2019, 08:42:05) 
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdin
<_io.TextIOWrapper name='<stdin>' mode='r' encoding='UTF-8'>
>>> sys.stdout
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
>>> sys.stderr
<_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>
plockaby commented 5 years ago

Ah, you're right, sys.stdin is a character stream. But in my example I use sys.stdin.buffer which is a byte stream of type io.BufferedReader. The reason I use the byte stream wrapped in the character stream is because, by default, TextIOWrapper implements universal newlines which means that when my log lines have "\r\n" in them (which happens) the "\r\n" gets turned into "\n" and then my calls to .read(PAYLOAD_SIZE) wait for an extra character that will never come.

vsajip commented 5 years ago

means that when my log lines have "\r\n" in them (which happens)

How come, if you're not on Windows? This is a different problem to what you described earlier (which related to arbitrary encodings, not utf-8).

plockaby commented 5 years ago

How come, if you're not on Windows?

My programs run on a Unix platform but sometimes they log garbage that was given to them which can include carriage returns. You know, like a webapp that says in its logs "unauthorized access from foo\r\nbar because it's an invalid username". Sometimes programs log things over which the author has no control. Or maybe I have a third party program running under Supervisor that just does its own thing and I can't change that.

This is a different problem to what you described earlier (which related to arbitrary encodings, not utf-8).

The universal newline thing is a separate problem that is not a problem with Supervisor which is why I didn't mention it before. But since you suggested using sys.stdin the universal newline problem is why I can't use sys.stdin.

To restate the actual problem, the problem here is that Supervisor gives characters instead of bytes when it tells you the length of an event's payload. The reason that this is a problem is because the documentation says that it should return bytes. You said, well sys.stdin is a character stream so just use that -- see? no breaking change? And I said but sys.stdin has this universal newline problem so one must use sys.stdin.buffer which is a byte stream. Because of the universal newlines problem, I and likely anyone else who has implemented an event listener in Python3 will be working with a byte stream unless they've decided that they'll never see a carriage return in their log files.

vsajip commented 5 years ago

You said, well sys.stdin is a character stream so just use that -- see?

Where did I say that? I was just looking for more information to diagnose the problem, and merely pointed out that sys.stdin etc. are character streams in response to your comment that they are byte streams. Before you mentioned it, the issue of newlines was for me completely out of the picture for this issue.

The reason that this is a problem is because the documentation says that it should return bytes.

It used to be bytes under Python 2.x, but as we know, Python 2.x plays a bit fast and loose with bytes and text and fudges things here and there. It's not yet clear to me that it needs to be or should be bytes under Python 3.x because, by default, the process streams are character streams under Python 3.x. (But, that's not my call - I'm not an official Supervisor maintainer, merely an interested party).

As I said earlier, the documentation discrepancy could be fixed by updating the documentation, but it doesn't seem like that would solve your problem.

I and likely anyone else who has implemented an event listener in Python3 will be working with a byte stream unless they've decided that they'll never see a carriage return in their log files.

Maybe so. But that would likely be a much smaller number of situations than a more general problem of randomly-encoded binary data appearing in logs.

By the way, I noticed that you are using the buffer attribute of the sys.stdin text stream. According to the documentation for it,

This [the buffer attribute] is not part of the TextIOBase API and may not exist in some implementations.

Just thought I'd mention it.

plockaby commented 5 years ago

Geez I did not read the full documentation on .buffer apparently. Holy cow Python, between universal newlines and giving character streams for stdin/stdout how do they expect anyone to work with streams in any coherent fashion and make any decisions on their own about encodings or, for that matter, work with binary data? I'm now mostly rescinding my call to change Supervisor to use byte streams. Just as long as the documentation is fixed and the upgrade guide noted that this is a potentially breaking change I am happy.

My problem is actually moot at this point. I have a solution that works, though I don't think it's pretty and it definitely caught me off guard and I don't think I'm doing anything particularly fancy or special. I figured, based on the docs, that other people would likely have this same problem and that Supervisor shouldn't be making encoding decisions. It looks, however, like Python is making a lot of these decisions and Supervisor is trying to living inside of them. So, yeah, probably not as much to do here as I thought.

Ugh. Python. Thanks for your patience with me.

vsajip commented 5 years ago

Geez I did not read the full documentation on .buffer apparently.

According to this documentation on sys.stdin, it actually suggests using buffer, so you're not wrong to use that - it's just a caveat about portability on other platforms (which is probably less of an issue in this case).

Just as long as the documentation is fixed and the upgrade guide noted that this is a potentially breaking change I am happy.

Seems reasonable to me.

I have a solution that works, though I don't think it's pretty

Care to share it, anyway? Or is it just the

stdin = io.TextIOWrapper(sys.stdin.buffer, encoding="utf-8", newline='\n')

you mentioned earlier?

vsajip commented 5 years ago

By the way, I had a play with a couple of short scripts and a test file with CRLFs, all saved in ~/tmp:

A file with CRLF
newlines and some Unicode (like Øresund, Malmö and Eyjafjallajökull)
just for testing with.

(should be saved as crlf.txt with CRLF line endings)

#!/usr/bin/env python3
import os
import sys

fn = os.path.expanduser('~/tmp/crlf.txt')
with open(fn, 'rb') as f:
    data = f.read()
byte_len = len(data)
data = data.decode('utf-8')
print('len:%d byte_len:%d\n%s\nfoobarbaz' % (len(data), byte_len, data))

(should be saved as writer.py and execute-enabled)

#!/usr/bin/env python3
import os
import sys

header = sys.stdin.readline()
print('Header: %s' % header.rstrip())
fields = dict(s.split(':') for s in header.split())
paylen = int(fields['len'])
print('Reading %d chars:' % paylen)
payload = sys.stdin.read(paylen)
print(repr(payload))

(should be saved as reader.py and execute-enabled)

Then, if I run:

~/tmp $ ./writer.py | ./reader.py
Header: len:112 byte_len:115
Reading 112 chars:
'A file with CRLF\r\nnewlines and some Unicode (like Øresund, Malmö and Eyjafjallajökull)\r\njust for testing with.\r\n'

Notice that I'm using sys.stdin, and using the len value provided in the header to read the number of chars specified. This seems to be working as expected (no loss of '\r' characters, and the whole payload is read), so can you provide a similar setup which shows up your problem? My python3 is Python 3.7.3.

mnaberez commented 5 years ago

@plockaby @vsajip Thank you both for your investigations into this issue and great discussion points made here.

Event listeners running under Supervisor may be written in Python 2 or 3, but they may also be written in other languages. There are open source Supervisor event listeners written in PHP, Ruby, Perl, JavaScript, Golang, and possibly others too. Not all of the existing event listeners use the PROCESS_LOG events, but ideally a user would be able to easily consume PROCESS_LOG events in the language of their choice.

A given language may or may not have good Unicode support, and maintaining compatibility with existing event listeners should be a goal. I would like your thoughts on the idea of putting both the bytes-length and the characters-length in the header.

The event header is versioned:

ver:3.0 server:supervisor serial:21 pool:listener poolserial:10 eventname:PROCESS_LOG_STDOUT len:54

This is the version of the events API, not the version of Supervisor. In Supervisor 4, ver:3.0 is still emitted.

Existing event listeners, assuming they are well-written, should be able to accept extra key-value pairs in the event header. I am thinking about the following:

The rationale for the above suggestion is:

Please let me know what you think. If we agree this solution (or similar) would work, I would merge a pull request for it. I think it could be released in a 4.0.x point release, with the changelog saying it is a bug fix for compatibility with existing event listeners.

vsajip commented 5 years ago

This seems like a reasonable approach, but if existing event listeners running under Python 3 use sys.stdin rather than sys.stdin.buffer, ISTM this approach won't work in general if the len field has the byte length (this will typically be larger than the char len if there's Unicode present, and so could lead to waiting for bytes that never arrive or reading additional bytes). IMO another way of looking at it is - sys.stdin goes with len, whereas sys.stdin.buffer goes with a byte length.

I'd wait for @plockaby to show us an example config/scripts with a failing scenario (as my simple example works fine with Unicode in the data as well as a mix of \n and \r\n, using the len: value and sys.stdin). Also, what do the third-party event listeners you (@mnaberez) maintain do? Do they use sys.stdin or sys.stdin.buffer? Do we know of other published third-party event listeners out there? The mailing lists don't seem active - is there a Google group or other forum where we can ask for feedback?

I'd contend that Supervisor isn't meant for processing arbitrary binary output from child processes via event listeners - stuff written to stdout and stderr by a Supervisor child should be logically text, not binary. I agree that one can't in general expect perhaps-third-party child programs to do the right thing, but current behaviour is to deliver 'Undecodable: ' + a repr of the undecodable binary, which allows full recovery of the binary data by the event listener at the cost of a bit of extra work. If we wanted a more generic solution, we could output 'Undecodable: ' + a base-64 encoded value rather than a Python repr.

If you accept the should-be-text-only logic I've propounded, then len: would be fine as is - a char length to be used in conjunction with sys.stdin. I would wait for more information about potentially impacted Supervisor installations before making a change in this area. Perhaps the title of this issue should be changed to Eventlistener "len" header mismatch in the presence of binary data or similar so that others who run into an issue can spot this and comment on the issue here.

plockaby commented 5 years ago

I'll have an example soon. I'll be out of town the next few days, unfortunately. Sorry for the delay.

vsajip commented 5 years ago

There's no rush.

mnaberez commented 5 years ago

This seems like a reasonable approach, but if existing event listeners running under Python 3 use sys.stdin rather than sys.stdin.buffer, ISTM this approach won't work in general if the len field has the byte length (this will typically be larger than the char len if there's Unicode present, and so could lead to waiting for bytes that never arrive or reading additional bytes). IMO another way of looking at it is - sys.stdin goes with len, whereas sys.stdin.buffer goes with a byte length.

Also, what do the third-party event listeners you (@mnaberez) maintain do? Do they use sys.stdin or sys.stdin.buffer?

I do maintain a couple of event listeners that use this feature but they are not public and not written in Python. They forward stdout messages to a logging service.

Do we know of other published third-party event listeners out there?

We have a list of third-party plugins. Not all are event listeners and not all of the event listeners will use the logs event. However, if an event listener subscribes to all events (events= in the config file) then this issue could cause any listener to break. The listener can't get out of sync with the payload size. The protocol requires the listener to acknowledge every event.

The mailing lists don't seem active - is there a Google group or other forum where we can ask for feedback?

Unfortunately, the mailing list is in limbo at the moment (issue 1241). It disappeared without notice recently. I've sent emails to its owners but they have not responded yet. There is not another forum yet.

I'd contend that Supervisor isn't meant for processing arbitrary binary output from child processes via event listeners - stuff written to stdout and stderr by a Supervisor child should be logically text, not binary.

A subprocess should be able to emit any binary garbage and this should not cause supervisord, supervisorctl, or event listeners to crash. Beyond that, I agree on this point.

If you accept the should-be-text-only logic I've propounded, then len: would be fine as is - a char length to be used in conjunction with sys.stdin. I would wait for more information about potentially impacted Supervisor installations before making a change in this area. Perhaps the title of this issue should be changed to Eventlistener "len" header mismatch in the presence of binary data

I am still thinking we should probably put len: back to bytes-length and add charlen: or similar for all these reasons, but I am certainly open to further discussion.

vsajip commented 5 years ago

I don't disagree with anything you said - backward compatibility is important but can be hard to achieve elegantly. I was just putting forward the other viewpoint, but either way, some people will trip up.

Encouraging use of childutils could be done by means of code examples in the documentation, perhaps explicitly covering the bytes and text scenarios.

A subprocess should be able to emit any binary garbage and this should not cause supervisord, supervisorctl, or event listeners to crash.

Definitely, but of course badly written third-party event listeners might. Currently, AFAIK, with arbitrary binary data, the Supervisor event listener code won't crash, just return 'Undecodable: ' + the repr of the binary. This is sub-optimal for non-Python event listeners, so we could and perhaps should switch to base64 encoding the binary data.

mnaberez commented 5 years ago

I don't disagree with anything you said - backward compatibility is important but can be hard to achieve elegantly. I was just putting forward the other viewpoint, but either way, some people will trip up.

Yeah, I think that unfortunately no matter what we do, this issue is going to trip up some people. I mainly have the opinion we should put len: back to its original meaning of bytes-length and add another header for characters-length because I think it will have the smallest impact (that's my hope, anyway).

Encouraging use of childutils could be done by means of code examples in the documentation, perhaps explicitly covering the bytes and text scenarios.

I agree we should do this and have explicit examples in Python. We can't help event listeners in other languages but it will at least draw sufficient attention to the issue.

Definitely, but of course badly written third-party event listeners might.

Yeah, I don't think there's much we can do there. Badly-written event listeners can crash for a lot of other reasons, too. I think the usual result is that supervisord will rebuffer the event, restart the event listener, and the event listener will crash again. This will repeat until someone notices.

Currently, AFAIK, with arbitrary binary data, the Supervisor event listener code won't crash, just return 'Undecodable: ' + the repr of the binary. This is sub-optimal for non-Python event listeners, so we could and perhaps should switch to base64 encoding the binary data.

I agree with you that stdout should logically be text. I only have the opinion on len: going back to bytes because of compatibility concerns. However, since we can't control what a given subprocess will output, we have to deal with the possibility of binary junk.

I think that you're right that we should use base64 (or something not Python-specific) for the binary data in the undecodable message. We should probably prefix this message with supervisord: or similar so that it's clear the message is from supervisord, since it's just mixed in the stream with the stuff from stdout.

@plockaby I'd be interested in your thoughts on the proposal as well, since you discovered this and have experience working with event listeners.

@vsajip I am probably not going to be able to work on this for a while. If we all reach agreement and you are able to work on a patch, I'd appreciate that and would be happy to merge it and release it as a point release.

plockaby commented 5 years ago

Sorry that I've been AWOL on this conversation. I'm literally putting together my example right now.

mnaberez commented 5 years ago

Sorry that I've been AWOL on this conversation. I'm literally putting together my example right now.

No problem, it hasn't been very long, and thanks!

plockaby commented 5 years ago

I've made this gist: https://gist.github.com/plockaby/10b75dfa72dc952c5bdb7000328c9b72

It has the code for my event listener and two files that I used to test it. I found that using these work:

stdin = io.TextIOWrapper(sys.stdin.buffer, encoding="utf-8", newline="\n") stdin = sys.stdin

But that using either of these does not work:

stdin = io.TextIOWrapper(sys.stdin.buffer, encoding="utf-8") stdin = sys.stdin.buffer

I can't find any documentation on whether sys.stdin implements universal newlines by default. PEP278 says that it should but my testing says that it does not. The documentation on stdin is ambiguous in this area, I think, and my google-fu could find no definitive answers and only found people asking whether it does or not.

One thing is certain and that is that my assertion earlier that sys.stdin is a byte stream was woefully wrong and based on a poor understanding of what was happening under code written to use Python2. I haven't tried implementing an event listener for Supervisor running on Python3 using languages other than Python so I'm not sure how this would play anywhere else.

My preference, I suppose, is to see "len" stand for bytes to match the documentation and for "charlen" to indicate the number of characters. Either way someone's event listener will broke with the update to Python3 if they're doing anything outside of ASCII and that's just how it is.

vsajip commented 5 years ago

I've made this gist

Thanks. Any chance you could add the Supervisor conf file that you used for testing?

vsajip commented 5 years ago

Also, just to confirm my understanding from just reading the Gist - sys.stdin works as-is with len: (number of chars, rather than bytes)? So you wouldn't have even seen/raised this issue if you'd just used sys.stdin in the first place? Which is what one is supposed to do - treat sys.stdin as a character stream.

I realise that doesn't mean everything works fine for non-Python event listeners, where the nature of stdin might be different.

plockaby commented 5 years ago

I've added a supervisord.conf file to my gist.

You are correct that using sys.stdin seems to work in my example. I'm cautious about relying on it since the functionality of universal newlines with respect to sys.stdin, as far as I can determine, is not well documented.

I don't recall it working when I was trying to solve my production problem but that could have been because I was not fully grokking that sys.stdin already was a text stream and may have been doing additional things to it that broke it more. I can't say with any certainty what my problem initially was, only what my test code shows. My test code that I've shared with you shows that it works. But again, I'm not comfortable relying on it given the lack of documentation around one feature that I can say with certainty did break for me.

vsajip commented 5 years ago

I've added a supervisord.conf file to my gist.

Thanks - appreciated.

I hear what you say, but my take-away from it is that we should wait to see if any more errors in this area are reported, in order to gather better failure data, before making any changes.

mnaberez commented 5 years ago

I hear what you say, but my take-away from it is that we should wait to see if any more errors in this area are reported, in order to gather better failure data, before making any changes.

I don't think we should wait. The 4.x eventlistener protocol is inconsistent with all prior 3.x versions (which are still in widespread use) and inconsistent with the documentation. This is a bug. I don't think we need more failure reports as we seem to already understand the technical reasons behind the problem.

stestagg commented 5 years ago

Just to weigh in, I've just hit this problem in my project.

I would prefer keeping len: to mean bytes, and ideally it would be great if supervisord didn't change the content of the payload if it isn't valid UTF-8, but this is work-aroundable.

awgymer commented 3 years ago

I'm running Supervisor 4.2.0 and Python 3.8.0 and in the process of writing an event listener came across this issue. As I understand this issue, using childutils from the same version of Supervisor and running under the same python as the supervisor process, there shouldn't be an error?