Barski-lab / cwl-airflow

Python package to extend Airflow functionality with CWL1.1 support
https://barski-lab.github.io/cwl-airflow
Apache License 2.0
183 stars 33 forks source link

Gather stderr from failed step #80

Closed mpolykovskiy closed 2 years ago

mpolykovskiy commented 2 years ago

@michael-kotliar this PR is intended to gather output from the failed stage for debug purposes. Any comments are welcome.

mpolykovskiy commented 2 years ago

Sample cwl echo.cwl.zip

Looks like Снимок экрана от 2022-02-09 13-07-47

Output: output.zip

michael-kotliar commented 2 years ago

@mpolykovskiy Thanks for PR! Looks interesting!

Correct me if I'm wrong, these changes will mainly influence the results being returned from the failed step, but the workflow will still be marked as failed if at least one step had failed, right?

Michael

portah commented 2 years ago

I think you are right. However, we also want to use the output to monitor the progress of a given long-running step. If you can propose an alternative for progress monitoring, it will be extremely helpful.

I have not looked up deep into code, but where the stderr would go? If log file, I believe we already put it there... maybe made some kind of a hook to post changes to an API?

michael-kotliar commented 2 years ago

Oh, I see. The problem is that we save CWL related logs in the file. This file is appended to the Airflow log after the task finishes running. So there is no way to see changes online unless you setup different logging for CWL here https://github.com/Barski-lab/cwl-airflow/blob/ca14232bbb78df242c841047318bcb67c775379f/cwl_airflow/config_templates/airflow_local_settings.py#L98

or remove it completely so the logs from CWL will be printed to the airflow logs and then check if Airflow provides any streaming solution for logs.

Also, in this case the CWL tools shouldn't catch any stdout or stderr, otherwise it will be saved to file.

This might be useful information https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-tasks.html#serving-logs-from-workers

mpolykovskiy commented 2 years ago

It seems that this PR doesnt fix the root problem, so I'll close it and try arain. @michael-kotliar thank you for pointing the key location, it helped a lot.

mpolykovskiy commented 2 years ago

@michael-kotliar btw, another problem with logs is here - https://github.com/Barski-lab/cwl-airflow/blob/master/cwl_airflow/components/init/config.py#L105 when all logging classes are substituted by DEFAULT_LOGGING_CONFIG and cannot be caught by airflow.

michael-kotliar commented 2 years ago

@mpolykovskiy Yeah, you're right, that's the place where I actually replaced the default logging in order to split the main airflow logs from CWL logs into two separate files. Feel free to comment it out if needed. Also, this project might be useful for showing the logs - https://grafana.com/oss/loki/. I even saw one issue on Apache Airflow about adding support for this, but it's not yet resolved. Additionally, a while ago I played a bit with TIG (Telegraph, InfluxDB, Grafana) and created docker-compose file for that https://github.com/michael-kotliar/tig-stack. This definitely can be adjusted to receive metrics from Airflow (through statsd), however, at that time I couldn't make it get any logs from Airflow.