astronomer / astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
https://astro-sdk-python.rtfd.io/
Apache License 2.0
343 stars 42 forks source link

Unable to upload file over SFTP #1964

Closed sumitkanoje closed 1 year ago

sumitkanoje commented 1 year ago

Describe the bug Uploading file to SFTP fails with below error

OSError: no such path.

Version Astro SDK: 1.6.1 Astro runtime: 8.4.0 OS: macOS To Reproduce

Steps to reproduce the behavior: Create a new Astro CLI project with astro dev init Modify requirements.txt to have the line astro-sdk-python[duckdb]==1.6.1 Start the Airflow instance with astro dev start Try uploading file using SFTP file type

output_file=File(
            path="sftp://user@hostname/upload/ADOPTION_CENTER_1_unquoted.csv", conn_id="sftp_conn", filetype=FileType.CSV
        ),

How to fix / workaround: https://github.com/astronomer/astro-sdk/blob/98a7a22857f79c46d35bf6bc3df813cb47a6207f/python-sdk/src/astro/files/locations/sftp.py#L96

I tested locally and found that after removing parsed_url.netloc from above line, resolves the issue. Issue occurs because it tries to look for folder parsed_url.netloc which is hostname from sftp url, which may not exist on the server in case situations.

pankajastro commented 1 year ago

Ok, if that is the case can you please try with

output_file=File(
                 path="sftp:/upload/ADOPTION_CENTER_1_unquoted.csv", conn_id="sftp_conn", filetype=FileType.CSV
            ),

this way parsed_url.netloc would be empty

sumitkanoje commented 1 year ago

Thanks for the reply @pankajastro, I tried this already and it throw me an error saying hostname is mandatory for establishing sftp connection.

pankajastro commented 1 year ago

Thanks for the quick testing and responding back. I'll investigate it.

sumitkanoje commented 1 year ago

Hi @pankajastro, I tested again and it seems working fine at my end, thanks for the help. The error I was getting for hostname was because of the extra / in sftp upload location ( sftp://upload/xyz.csv instead if sftp:/upload/xyz.csv )

pankajastro commented 1 year ago

Can you please share the Airflow task which you are using in your DAG? I can see an example here without host https://github.com/astronomer/astro-sdk/blob/8d9e80719608a6f55f537bd41a992234638731fe/python-sdk/example_dags/example_load_file.py#L267-L279

pankajastro commented 1 year ago

closing it. please feel free to re-open if needed.

sumitkanoje commented 1 year ago

I would still suggest to go ahead and add test for the scenario when the output is SFTP file and input is Snowflake Table. Last time I referred the example(mentioned above) which has extra / in sftp path, and faced hostname related issue.

@pankajastro You'll should update the example by removing extra / from the sftp path.