furqanshahid85-python / Python-FTP-File-Ingestion

This module provides the functionality of uploading files to s3 from a FTP server. An SFTP connection is created with the FTP server and all the files present in the specified directory are uploaded to the specified s3 bucket. Following are the key features of this module: Creates a secure ssh connection with FTP server. Handles multipart upload to s3 automatically, if file size is greater than 100MB (can be configured). Automatically handles retires in case of failed uploads during multipart upload. Partitions the data in s3 based on current year,month,day,hour. Ensures which file has been processed or needs to be processed.
13 stars 7 forks source link

Connection error #2

Open ghost opened 3 years ago

ghost commented 3 years ago

Traceback (most recent call last): File "/tmp/runscript.py", line 211, in runpy.run_path(temp_file_path, run_name='main') File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/tmp/glue-python-scripts-mjxcjss5/glue-sftp-ingestion.py", line 27, in File "/glue/lib/installation/paramiko/client.py", line 349, in connect retry_on_signal(lambda: sock.connect(addr)) File "/glue/lib/installation/paramiko/util.py", line 283, in retry_on_signal return function() File "/glue/lib/installation/paramiko/client.py", line 349, in retry_on_signal(lambda: sock.connect(addr)) TimeoutError: [Errno 110] Connection timed out

I am running the job on aws glue python shell. Can anyone help me resolve the issue.

furqanshahid85-python commented 3 years ago

Your connection is timing out. Connecting is not being established with your sftp client. Check that your security groups/nacls allow inbound/outbound connections. Also check that your sftp client allows connections to it as well.

ghost commented 3 years ago

Actually I am doing this as a part of my college project and I am new to this aws glue and networking. Is there any chance you can help me on this or any tutorial or something that already there which can help me. I am asking this since I haven't found any proper resource for SFTP to S3 using aws Glue apart from your repository and another one quite similar to this.

While creating a glue job I have not selected any vpc, so how can I configure the security groups and also how can we select a vpc while creating a glue job? Don't take me wrong for asking so many questions. If possible please connect me on my email- nani.veeru.9999@gmail.com

furqanshahid85-python commented 3 years ago

you should check out my articles on aws glue and data ingestion if you haven't already. for aws glue: https://towardsdatascience.com/extract-transform-load-etl-aws-glue-edd383218cfd for data ingestion: https://towardsdatascience.com/datalake-file-ingestion-from-ftp-to-aws-s3-253022ae54d4

ghost commented 3 years ago

Actually, I have gone through them but I haven't found the required fix for my issue. I will define my problem here see if you can get any thing from it. Since this is for testing I have downloaded the Rebex Buru SFTP server and installed it on my system and created a user and added some files to it. Now I am trying to copy these files to my S3 bucket using glue. Initially I got some paramiko import errors but I was able to resolve them. Using this glue job when I put parameters https://test.rebex.net/ of sftp rebex test server I was able to connect and copy the files from that server. But when I put the details of the server I installed on my laptop it is throwing the above error. Can you help me here?

furqanshahid85-python commented 3 years ago

what ftp server are you using on your laptop? how about try using Filezilla?

ghost commented 3 years ago

I will try that and update you