FNNDSC / pman

A process management system written in python
MIT License
22 stars 33 forks source link

Log retrieval #197

Open jennydaman opened 2 years ago

jennydaman commented 2 years ago

Log retrieval is problematic for two reasons:

Single Stream instead of stdout/stderr

In CUBE, pfcon, and pman, there is no distinction between stdout and stderr. Either the two output streams are joined, or one is disregarded.

Buffering full logs instead of streaming

https://github.com/FNNDSC/pman/blob/87b68ae6fcb78532d572d8412ce587d3460ce9a1/pman/abstractmgr.py#L102

Internally, the AbstractManager interface does not support streaming (or chunking/pagination) of large logs. Plugin instances which produce a lot of output will cause pman to hang and time out when attempting to retrieve the logs.

Also, the AbstractManager interface is changed: get_job_logs returns typing.AnyStr which more accuately describes the data type of a container's logs. The encoding is handled in pman/services.py instead, an improvement over a previous hotfix https://github.com/FNNDSC/pman/commit/b41cefb88942aa33bd20302317b367689a9c9eca.

openshiftmgr.py is not fixed because I think it should be rewritten from the ground-up.

jennydaman commented 2 years ago

Moreover, output is not necessarily UTF-8 string data. Non-UTF-8 data printed to terminal causes This wasn't handled:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.8/dist-packages/flask_restful/__init__.py", line 467, in wrapper
    resp = resource(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/flask/views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/flask_restful/__init__.py", line 582, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/home/localuser/pman/pman/resources.py", line 164, in get
    job_logs = self.compute_mgr.get_job_logs(job)
  File "/home/localuser/pman/pman/swarmmgr.py", line 59, in get_job_logs
    return ''.join([l.decode() for l in job.logs(stdout=True, stderr=True)])
  File "/home/localuser/pman/pman/swarmmgr.py", line 59, in <listcomp>
    return ''.join([l.decode() for l in job.logs(stdout=True, stderr=True)])
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 0: unexpected end of data

Now fixed: https://github.com/FNNDSC/pman/commit/b41cefb88942aa33bd20302317b367689a9c9eca