Open wno-xyt opened 1 year ago
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
AFAIK, there is no papermill users among active Airflow maintainers, so I would recommend to check it by your own and make a PR with changes, otherwise it might take unpredictable time to implement it better could we do it mark this as good first issue
Some useful links
FWIW cc: @bolkedebruin I think might have more insights on Papermill operator, and it's usage (and from what I remember I believe it's generally unusable
sentence comes to my mind.. But I might as well misunderstood it.
When, I added the PapermillOperator we were experimenting with it to allow our data scientists to become more productive as in being able to schedule experiments faster. I think that the world has chosen to have/keep its ML experimentation to mostly elsewhere. However, the notebook idea is still very alive with the likes of Databricks and there have been recent updates to the Papermill repo.
It might just be that the Airflow audience typically doesn't like notebooks and the audience that does typically does not go to Airflow. That might be due to the fact that the PapermillOperator isn't well documented and does not have great examples. In other words, the PapermillOperator needs some love.
So, I would say not unusable (it does support python 3.12 now @potiuk ) but not well groomed :-).
Just a wild thought: It would be fun if we could read DBC and run that, which would look like Papermill but not exactly.
Description
Function
execute_notebook
in papermill accepts named parameterlog_output
:execute_notebook(nb, kernel_name, output_path=None, progress_bar=True, log_output=False, autosave_cell_every=30, **kwargs)
Unfortunatelly currently the interface of the provider package does not allow setting it to True. It would be great to have this possibility to have logs from executed notebook visible in airflow's task log. If I understand correctly how this works setting
log_output=True
would just make papermill to use configured logger (in this case airflow's) for the output of the notebook.Use case/motivation
I think it would be nice to have logs from notebooks execution visible in airflow task log to be able to:
Related issues
No response
Are you willing to submit a PR?
Code of Conduct