Open kandharvishnu opened 1 month ago
Is there an issue with using an SFTPSensor
for checking file existence, prior to running SFTPToGCSOperator
?
Is there an issue with using an
SFTPSensor
for checking file existence, prior to runningSFTPToGCSOperator
?
The SFTP sensor will wait until the file is detected. This issue is to determine whether the task should fail if the file is not found.
Is there an issue with using an
SFTPSensor
for checking file existence, prior to runningSFTPToGCSOperator
?The SFTP sensor will wait until the file is detected. This issue is to determine whether the task should fail if the file is not found.
I see. In that case, I don't think it makes sense to have the task succeed if a file is not found. A 'success' would not have much meaning anymore besides that the task simply ran. The interpretation of 'success' would become ambiguous i.e. how can you tell if a file was indeed moved or not without having to further dig into the logs? In addition, it doesn't feel very 'airflow-like' since we would be overloading the operator with functionality that already exists through a sensor.
I'm not sure what your DAG looks like, but perhaps you can use a SFTPSensor
and trigger rules for your downstream tasks? There is soft_fail
for sensors too, if you want the sensor to 'skip' instead of 'fail'.
Is there an issue with using an
SFTPSensor
for checking file existence, prior to runningSFTPToGCSOperator
?The SFTP sensor will wait until the file is detected. This issue is to determine whether the task should fail if the file is not found.
I see. In that case, I don't think it makes sense to have the task succeed if a file is not found. A 'success' would not have much meaning anymore besides that the task simply ran. The interpretation of 'success' would become ambiguous i.e. how can you tell if a file was indeed moved or not without having to further dig into the logs? In addition, it doesn't feel very 'airflow-like' since we would be overloading the operator with functionality that already exists through a sensor. There is
soft_fail
for sensors too, if you want the sensor to 'skip' instead of 'fail'.
I agree with your point, but this is an optional parameter. By default, it will fail the task if the file is not found. If the user sets the fail_on_sftp_file_not_exist
to False, it would make some difference.
I'm not sure what your DAG looks like, but perhaps you can use a
SFTPSensor
and trigger rules for your downstream tasks?
The final task of my DAG is looking for a file, and the dag_run should not fail, though the file is not present. And I don't want to use the sensor to poke for a file.
Is there an issue with using an
SFTPSensor
for checking file existence, prior to runningSFTPToGCSOperator
?The SFTP sensor will wait until the file is detected. This issue is to determine whether the task should fail if the file is not found.
I see. In that case, I don't think it makes sense to have the task succeed if a file is not found. A 'success' would not have much meaning anymore besides that the task simply ran. The interpretation of 'success' would become ambiguous i.e. how can you tell if a file was indeed moved or not without having to further dig into the logs? In addition, it doesn't feel very 'airflow-like' since we would be overloading the operator with functionality that already exists through a sensor. There is
soft_fail
for sensors too, if you want the sensor to 'skip' instead of 'fail'.I agree with your point, but this is an optional parameter. By default, it will fail the task if the file is not found. If the user sets the
fail_on_sftp_file_not_exist
to False, it would make some difference.I'm not sure what your DAG looks like, but perhaps you can use a
SFTPSensor
and trigger rules for your downstream tasks?The final task of my DAG is looking for a file, and the dag_run should not fail, though the file is not present. And I don't want to use the sensor to poke for a file.
That's unfortunate you don't want to use a sensor for checking for file existence. If the sensor failed (i.e. file didn't exist after some time), you could have it soft_fail
and it would skip (instead of fail) and you could choose which downstream tasks get skipped or ran based on trigger rules.
At any rate, I think as long as you use the trigger rule all_done
for one (or more) of the downstream tasks, the dag_run will still be marked as successful. So you could have your SFTPToGCSOperator
task fail (because the file doesn't exist on SFTP), and have the trigger rule for all other downstream tasks set to all_done
, and they will still run and the dag_run will still be considered successful.
Description
Currently, the operator fails if a file is not found on the SFTP server. To make this behavior more flexible, we could introduce a new parameter,
fail_on_sftp_file_not_exist
, allowing users to customize the operator's response in such scenarios. By setting this parameter, users can choose whether the operator should raise an error when a file is missing or simply log a message and continue processing.Use case/motivation
This enhancement would give users greater control over the operator's behaviour, particularly in environments where missing files are expected or where continued processing is preferred.
Related issues
A similar issue has been opened for the
SFTPToS3Operator
.Are you willing to submit a PR?
Code of Conduct
[X] I agree to follow this project's Code of Conduct
PR