galaxyproject / pulsar

Distributed job execution application built for Galaxy
https://pulsar.readthedocs.io
Apache License 2.0
37 stars 49 forks source link

[WIP] Feature: Send stdout and stderr to Galaxy while job is running #345

Open gecage952 opened 8 months ago

gecage952 commented 8 months ago

Hey, So this is related to this pr in the main Galaxy repo: https://github.com/galaxyproject/galaxy/pull/16975. The changes facilitate sending both stdout and stderr to Galaxy while a job is running for the purposes of displaying said stdout inside of Galaxy while the job is running.

The main changes include adding two parameters to the app.yml config: send_stdout_update which is a boolean, and stdout_update_interval which is a float. The first controls whether Pulsar will send stdout/stderr or not, the second is the interval (in seconds) between updates.

The way that the files is sent is through the files endpoint in Galaxy. In order to not send the entire file each time, in a dict I keep track of the position in the stdout/stderr file that the last update read up to. I then only send the new part of the stdout file.

After the job is finished, I send any stdout/stderr left that has not been sent. In the final status message send over the broker, instead of including stdout there, I set the stdout and stderr fields there to None, so that it doesn't send the whole file again. In Galaxy, there are a couple of changes in the Pulsar job runner that check if those fields are None, and if so load the stdout from the job directory there.

Like with the other pr, this was done mostly with the intent of not messing around with existing functionality, which is why we didn't want to use messages to send it. Also, like the other pr, any feedback is welcome.