crazy-max / ftpgrab

Grab your files periodically from a remote FTP or SFTP server easily
https://crazymax.dev/ftpgrab/
MIT License
492 stars 76 forks source link

Empty folder leeds to spinlock #33

Closed masgo closed 6 years ago

masgo commented 6 years ago

Hi, first of all thanks for this great script!

When trying to use it, my example directory structure contained an empty folder which lead to the script hanging forever.

With debug enabled I could see how it loops. The only thing that changes is the srchash As soon as I added an empty file to this folder, the problem was gone.

Here is an excerpt of the debug output (I censored the real names)

#DEBUG checkfolder: curl --silent --list-only --globoff -u *****:***** --ftp-ssl "ftp://example.com:21/AN%5fEMPTY%5fFOLDER//////////////////////"
#DEBUG lineClean: /
#DEBUG basename: /
#DEBUG srcfile: /AN_EMPTY_FOLDER//////////////////////
#DEBUG srcfileproc: /AN%5fEMPTY%5fFOLDER//////////////////////
#DEBUG srcfileshort: AN_EMPTY_FOLDER//////////////////////
#DEBUG srcfileshort2: AN_EMPTY_FOLDER//////////////////////
#DEBUG srchash: 0e087c88145ec4a845f2f9d575560747
#DEBUG srcsize:
#DEBUG destfile: /tmp/seedbox/AN_EMPTY_FOLDER//////////////////////
#DEBUG destsize: N/A
#DEBUG vregex: AN_EMPTY_FOLDER//////////////////////
#DEBUG checkfolder: curl --silent --list-only --globoff -u *****:***** --ftp-ssl "ftp://example.com:21/AN%5fEMPTY%5fFOLDER////////////////////////"
#DEBUG lineClean: /
#DEBUG basename: /
#DEBUG srcfile: /AN_EMPTY_FOLDER////////////////////////
#DEBUG srcfileproc: /AN%5fEMPTY%5fFOLDER////////////////////////
#DEBUG srcfileshort: AN_EMPTY_FOLDER////////////////////////
#DEBUG srcfileshort2: AN_EMPTY_FOLDER////////////////////////
#DEBUG srchash: 123b24923553082f8beee2004ed55496
#DEBUG srcsize:
#DEBUG destfile: /tmp/seedbox/AN_EMPTY_FOLDER////////////////////////
#DEBUG destsize: N/A
#DEBUG vregex: AN_EMPTY_FOLDER////////////////////////
#DEBUG checkfolder: curl --silent --list-only --globoff -u *****:***** --ftp-ssl "ftp://example.com:21/AN%5fEMPTY%5fFOLDER//////////////////////////"
#DEBUG lineClean: /
#DEBUG basename: /
#DEBUG srcfile: /AN_EMPTY_FOLDER//////////////////////////
#DEBUG srcfileproc: /AN%5fEMPTY%5fFOLDER//////////////////////////
#DEBUG srcfileshort: AN_EMPTY_FOLDER//////////////////////////
#DEBUG srcfileshort2: AN_EMPTY_FOLDER//////////////////////////
#DEBUG srchash: af9471cd3ecc2715fdf5c276accbae98
#DEBUG srcsize:
#DEBUG destfile: /tmp/seedbox/AN_EMPTY_FOLDER//////////////////////////
#DEBUG destsize: N/A
#DEBUG vregex: AN_EMPTY_FOLDER//////////////////////////
#DEBUG checkfolder: curl --silent --list-only --globoff -u *****:***** --ftp-ssl "ftp://example.com:21/AN%5fEMPTY%5fFOLDER////////////////////////////"
#DEBUG lineClean: /
#DEBUG basename: /
#DEBUG srcfile: /AN_EMPTY_FOLDER////////////////////////////
#DEBUG srcfileproc: /AN%5fEMPTY%5fFOLDER////////////////////////////
#DEBUG srcfileshort: AN_EMPTY_FOLDER////////////////////////////
#DEBUG srcfileshort2: AN_EMPTY_FOLDER////////////////////////////
#DEBUG srchash: 4515adb96595251e22d416fd7c8bbbb9
#DEBUG srcsize:
#DEBUG destfile: /tmp/seedbox/AN_EMPTY_FOLDER////////////////////////////
#DEBUG destsize: N/A
#DEBUG vregex: AN_EMPTY_FOLDER////////////////////////////
crazy-max commented 6 years ago

Hi @masgo, thanks for your feedback, I will take a look asap

crazy-max commented 6 years ago

Can you post your config file please ? And if possible the structure of the FTP_SOURCES (command tree). Thanks

masgo commented 6 years ago

I only have FTP access to the server, therefore I can not do a tree. But if I do the curl command from the debug logs:

curl --silent --list-only --globoff -u xxxxx:xxxxx --ftp-ssl ftp://xxxxxxxx:21/THE_EMPTY_FOLDER/ > response.txt

Then the response file has 0 Bytes.

The name of the empty folder contains of characters, numbers and the _ sign.

My system is:

Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial

Linux xxxxxx 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

config:

# This file is an integral part of FTPGrab.
# More info : https://ftpgrab.github.io

# General
DIR_DEST="/tmp/seedbox"
EMAIL_LOG=""
DEBUG=0

# FTP
FTP_HOST="xxxxxxxxxx"
FTP_PORT="21"
FTP_USER="xxxxxxxxxx"
FTP_PASSWORD="xxxxxxxxxx"
FTP_SOURCES="/"

# FTP security (only with curl method)
FTP_SECURE=1
FTP_CHECK_CERT=1

# Download
DL_METHOD="curl"
DL_USER=""
DL_GROUP=""
DL_CHMOD=""
DL_REGEX=""
DL_EXCLUDE_REGEX=""
DL_RETRY=3
DL_RESUME=0
DL_SHUFFLE=0
DL_HIDE_SKIPPED=1
DL_HIDE_PROGRESS=1
DL_CREATE_BASEDIR=0

# Hash
HASH_ENABLED=1
HASH_TYPE="md5"
HASH_STORAGE="sqlite3"
crazy-max commented 6 years ago

Ok thanks

crazy-max commented 6 years ago

@masgo I cannot reproduce your issue. Can you post your full log if possible ? And follow those steps please : https://ftpgrab.github.io/doc/reporting-issue/ Thanks

crazy-max commented 6 years ago

@masgo Some feedback to help me investigate on your issue ?

masgo commented 6 years ago

Hi, here is some more info. I had to censor some of the log file but I kept it as pure as possible by only doing search&replace of some words.

What seems to happen is, that it checks the empty folder .. then it checks the // subfolder of it, then /// subfolder etc.

Environment

$ curl --version
curl 7.47.0 (x86_64-pc-linux-gnu) libcurl/7.47.0 GnuTLS/3.4.10 zlib/1.2.8 libidn/1.32 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP UnixSockets

$ md5sum --version
md5sum (GNU coreutils) 8.25

sqlite3 --version
3.11.0 2016-02-15 17:29:24 3d862f207e3adc00f78066799ac5a8c282430a5f
xxx@xxx:/opt/ftpgrab$ sudo -u ftp-sync ftpgrab ftp-archiv-sync-test.conf
FTPGrab v4.3.0 (ftp-archiv-sync-test - 2018/01/14 21:13:34)
--------------
Config: ftp-archiv-sync-test
Script PID: 15561
Log file: /var/log/ftpgrab/ftp-archiv-sync-test-20180114211334.log
FTP sources count: 1
FTP secure: 1
Download method: curl
Resume downloads: 0
Shuffle file/folder list: 0
Hash type: md5
Hash storage: sqlite3
Hash file: /opt/ftpgrab/hash/ftp-archiv-sync-test.db
--------------
#DEBUG FTP_SRC: /
#DEBUG DIR_DEST: /tmp/seedbox
#DEBUG DIR_DEST_REF: /tmp/seedbox/
Source: ftp://example.com:21/
Destination: /tmp/seedbox/
Checking connection to ftp://example.com:21/...
#DEBUG checkConnection: curl --silent --retry 1 --retry-delay 5 --globoff -u xxx:xxx --ftp-ssl ftp://example.com:21/
Successfully connected!
--------------
... it downloads some files from non-empty folders
--------------
#DEBUG checkfolder: curl --silent --list-only --globoff -u *****:***** --ftp-ssl "ftp://example.com:21/XXX%5fXXX%5f2000/"
#DEBUG lineClean: XXX_XXX_2000/
#DEBUG basename: XXX_XXX_2000
#DEBUG srcfile: /XXX_XXX_2000
#DEBUG srcfileproc: /XXX%5fXXX%5f2000
#DEBUG srcfileshort: XXX_XXX_2000
#DEBUG srcfileshort2: XXX_XXX_2000
#DEBUG srchash: 764cb0115f13863c3242b0a12b6dba31
#DEBUG srcsize:
#DEBUG destfile: /tmp/seedbox/XXX_XXX_2000
#DEBUG destsize: N/A
#DEBUG vregex: XXX_XXX_2000
#DEBUG checkfolder: curl --silent --list-only --globoff -u *****:***** --ftp-ssl "ftp://example.com:21/XXX%5fXXX%5f2000//"
#DEBUG lineClean: /
#DEBUG basename: /
#DEBUG srcfile: /XXX_XXX_2000//
#DEBUG srcfileproc: /XXX%5fXXX%5f2000//
#DEBUG srcfileshort: XXX_XXX_2000//
#DEBUG srcfileshort2: XXX_XXX_2000//
#DEBUG srchash: df7b63d00e9ebf2e22873a911fddc142
#DEBUG srcsize:
#DEBUG destfile: /tmp/seedbox/XXX_XXX_2000//
#DEBUG destsize: N/A
#DEBUG vregex: XXX_XXX_2000//
#DEBUG checkfolder: curl --silent --list-only --globoff -u *****:***** --ftp-ssl "ftp://example.com:21/XXX%5fXXX%5f2000////"
#DEBUG lineClean: /
#DEBUG basename: /
#DEBUG srcfile: /XXX_XXX_2000////
#DEBUG srcfileproc: /XXX%5fXXX%5f2000////
#DEBUG srcfileshort: XXX_XXX_2000////
#DEBUG srcfileshort2: XXX_XXX_2000////
#DEBUG srchash: 784ee8dfc7d01b7318910c08954eb947
#DEBUG srcsize:
#DEBUG destfile: /tmp/seedbox/XXX_XXX_2000////
#DEBUG destsize: N/A
#DEBUG vregex: XXX_XXX_2000////
#DEBUG checkfolder: curl --silent --list-only --globoff -u *****:***** --ftp-ssl "ftp://example.com:21/XXX%5fXXX%5f2000//////"
#DEBUG lineClean: /
#DEBUG basename: /
#DEBUG srcfile: /XXX_XXX_2000//////
#DEBUG srcfileproc: /XXX%5fXXX%5f2000//////
#DEBUG srcfileshort: XXX_XXX_2000//////
#DEBUG srcfileshort2: XXX_XXX_2000//////
#DEBUG srchash: 748cf6417ce58838d263aa803b5ffa9f
#DEBUG srcsize:
#DEBUG destfile: /tmp/seedbox/XXX_XXX_2000//////
#DEBUG destsize: N/A
#DEBUG vregex: XXX_XXX_2000//////
#DEBUG checkfolder: curl --silent --list-only --globoff -u *****:***** --ftp-ssl "ftp://example.com:21/XXX%5fXXX%5f2000////////"
#DEBUG lineClean: /
#DEBUG basename: /
#DEBUG srcfile: /XXX_XXX_2000////////
#DEBUG srcfileproc: /XXX%5fXXX%5f2000////////
#DEBUG srcfileshort: XXX_XXX_2000////////
#DEBUG srcfileshort2: XXX_XXX_2000////////
#DEBUG srchash: 2dad19f2222fca94568169278a2fb372
^C
crazy-max commented 6 years ago

Ok thanks @masgo

crazy-max commented 6 years ago

@masgo Oh i completely forgot your issue 😕 I will make a new release this week to solve it! Sorry!

masgo commented 6 years ago

@crazy-max hey no problem! This is a free and open source project. I have seen (much) longer bugfix times from software where I was a paying customer. So don't worry, you are doing a great job.

crazy-max commented 6 years ago

@masgo I still haven't been able to reproduce your problem on several configurations.

Here is a simple test I have made with a FTP source to /test/test_empty/ containing :

[-] empty
 |  a.pdf

The FTPGrab result is :

FTPGrab v4.3.1 (ftpgrab - 2018/03/05 00:22:05)
--------------
Config: ftpgrab
Script PID: 20
Log file: /var/log/ftpgrab/ftpgrab-20180305002205.log
FTP sources count: 1
FTP secure: 1
Download method: curl
Resume downloads: 0
Shuffle file/folder list: 0
Hash type: md5
Hash storage: text
Hash file: /opt/ftpgrab/hash/ftpgrab.txt
--------------
#DEBUG FTP_SRC: /test/test_empty/
#DEBUG DIR_DEST: /data
#DEBUG DIR_DEST_REF: /data/
Source: ftp://10.0.0.2:21/test/test_empty/
Destination: /data/
Checking connection to ftp://10.0.0.2:21/test/test_empty/...
#DEBUG checkConnection: curl --silent --retry 1 --retry-delay 5 --globoff --ftp-ssl --insecure ftp://10.0.0.2:21/test/test_empty/
Successfully connected!
--------------
#DEBUG checkfolder: curl --silent --list-only --globoff -u *****:***** --ftp-ssl --insecure "ftp://10.0.0.2:21/test/test%5fempty/a.pdf/"
#DEBUG lineClean: a.pdf
#DEBUG basename: a.pdf
#DEBUG srcfile: /test/test_empty/a.pdf
#DEBUG srcfileproc: /test/test%5fempty/a.pdf
#DEBUG srcfileshort: a.pdf
#DEBUG srcfileshort2: a.pdf
#DEBUG srchash: f87a4d24fcc05251afdb56cc24a5ca5d
#DEBUG srcsize: 35360997
#DEBUG destfile: /data/a.pdf
#DEBUG destsize: N/A
#DEBUG vregex: a.pdf
Process file: a.pdf
Hash: f87a4d24fcc05251afdb56cc24a5ca5d
Size: 33.72 Mb
Status: Never downloaded...
Start download to /data/a.pdf... Please wait...
#DEBUG Download command: curl --globoff -u *****:***** --ftp-ssl --insecure "ftp://10.0.0.2:21/test/test%5fempty/a.pdf" -o "/data/a.pdf"
File successfully downloaded!
Time spent: 00:00:36
--------------
#DEBUG checkfolder: curl --silent --list-only --globoff -u *****:***** --ftp-ssl --insecure "ftp://10.0.0.2:21/test/test%5fempty/empty/"
#DEBUG lineClean: empty/
#DEBUG basename: empty
#DEBUG srcfile: /test/test_empty/empty
#DEBUG srcfileproc: /test/test%5fempty/empty
#DEBUG srcfileshort: empty
#DEBUG srcfileshort2: empty
#DEBUG srchash: a2e4822a98337283e39f7b60acf85ec9
#DEBUG srcsize:
#DEBUG destfile: /data/empty
#DEBUG destsize: N/A
#DEBUG vregex: empty
Change the ownership recursively of 'Destination' path to docker:docker
--------------
Finished...

Can you make the same test and clean the destination directory ? What is your linux distrib ? And can you test on an other computer ? (or with the Docker image)

Thanks

crazy-max commented 6 years ago

Close due to inactivity

fletcherm commented 6 years ago

Hi crazy-max,

I may have a fix for this problem. Please see my commit here -- https://github.com/fletcherm/ftpgrab/commit/e78c8443316f1e93e6a7b4fd88b9be7cb6e72704

I was experiencing the same problem as masgo -- ftpgrab stuck listing the same empty directory over and over again, appending a / each subsequent time but getting the same result.

The linked commit breaks from the recursive function if the result from listing the directory is empty and is working well given my limited testing.

A few caveats:

  1. I considered adding some unit tests, but with the way things are structured, that'd be non-trivial and may introduce more problems. I am hoping you have a more exhaustive system test suite that can validate this change.
  2. I only tested this with curl -- wget appears to be more chatty when listing an empty directory, and in fact shows a .listing file for empty directories. curl simply returns blank. Anyway, I'm hoping your test suite can validate this change is ok for wget as well.
  3. I have not extensively tested this yet -- and will not until I get home later today or this weekend. I will let you know how well it turns out after I can point it at a real ftp server with a good mix of empty / non-empty directories.
  4. I am not sure why you were unable to reproduce this problem. I'm on Alpine Linux 3.8 and curl version 7.61.0. It is consistently returning blank when listing an empty directory. Have you tried a structure like this?
    [-]
    [-] empty1
    [-] empty2

Hope this helps. I will let you know how my more extensive testing goes this weekend.

crazy-max commented 6 years ago

Hi @fletcherm, thanks for your input on this. I will check this out asap.

fletcherm commented 6 years ago

Thanks, crazy-max.

I did some overnight testing against a live system. The patch appears to be holding up well. I am eager to learn what your testing finds.

A couple of other caveats:

  1. My bash scripting skills are not great so I am open to alternative, better ways to test for empty results.
  2. It would be wise to trim whitespace from the _$FILES variable before testing for emptiness, incase curl or wget return some spaces or newlines. I am hoping you know of a good way to do this.

Thanks again for creating this tool and investigating this problem!

crazy-max commented 6 years ago

Ok thanks, i make some tests today and it seems to only happen if FTP_SOURCES=/ and DL_METHOD=curl.

fletcherm commented 6 years ago

Cool! I am glad you were able to reproduce it.

For what it's worth, my FTP_SOURCES has about a half dozen entries. Either way I am glad you found the problem. Thanks for the updates!

crazy-max commented 6 years ago

And thanks to you to help me figures this out :)