embeddedmz / ftpclient-cpp

C++ client for making FTP requests
MIT License
204 stars 65 forks source link

Progress of wildcard download #38

Open Bandler opened 2 years ago

Bandler commented 2 years ago

If I use DownloadWildcard() together with a ProgressCallback, the given progress is always just showing the progress of the currently downloaded file and not of the overall progress. So if there are several files matched by the wildcard, there is no way of telling the overall progress.

embeddedmz commented 2 years ago

Following your question, I have tested something to fix this, unfortunately, there are no simple solutions to solve this problem. We need to know the size of the folder (to accurately calculate the progression rate in the callback + needless to say you will have to manage the size outside DownloadWildcard since this last is recursive, there's solutions for this problem) but the FTP servers usually can't give you this information (I tried the 'Info' method and it doesn't work with folders maybe there's a solution with SFTP servers https://stackoverflow.com/questions/45614242/bash-script-get-folder-size-through-ftp I think I can send the command 'du' to the server and this last will maybe send me a reply, this command can take so long to execute so maybe it will timeout I don't know).

Another interesting resource https://curl.se/mail/lib-2009-05/0225.html

Bandler commented 2 years ago

Thanks for your reply.

In my case, no recursion will be needed, because I address only special files in one directory. So lets say, all *.txt files in the given download folder. Would it be possible to provide this special (easier) case? It would only require to get the file list matching the wildcard expression in the directory and add their sizes.

embeddedmz commented 2 years ago

@Bandler I think we will need a new method for that. The current DownloadWildcard method uses libcurl features and is complex. I prefer not to touch it.

the best way is to fetch the list of the file names that match with an expression (that uses the asterisk and maybe '?' too), to compute the total size of all these files, to pass that size to the structure that you can get from the first parameter of the progress callback (cast the void* to ProgressFnStruct) and if that size is not equal to zero, you know that you must use it to compute the progress rate.

When you use List() with "bOnlyNames" set to "false" this is what it returns in a std::string :

drwxrwxrwx   1 user     group           0 Apr 26  2021 download_wildcard
drwxrwxrwx   1 user     group           0 Jun  2 15:23 nom avec des espaces
-rw-rw-rw-   1 user     group         161 Jun  2 15:26 test espace.txt
drwxrwxrwx   1 user     group           0 Jun 23 09:29 upload

and with bOnlyNames set to true (some ftp servers will add '.' and '..' in both lists at the beginning but they are directories so it's fine)

download_wildcard
nom avec des espaces
test espace.txt
upload

You can use these two lists to create a list of regular files that matches an expression : first you identify the indexes of the lines that begin with '-', then use these indexes to have the names of all regular files from the second non-detailed list and then filter the items that satisfy a regular expression (you can use regex or a simple code just google 'c++ wildcard').

Always check that indexes are not out of bound since the administrator of the FTP server can delete files between the 2 calls of List and report errors/use a retry mechanism in that case.

Since, the detailed list is tricky to parse, you can iterate over the filtered elements and use the method Info() and get the size of each file and sum all of them.

In the struct ProgressFnStruct, add a field "size_t filesTotalSize", in the constructor of the struct it should be initialized to zero and before performing a CURL operation, update it with the total file size.

Iterate over the list to download the items. Do not forget to append the folder where the files belong (e.g. if the file is located on the root folder, you don't have to add anything, however if it's in sub-directories, don't forget to append the full path to that folder, it's the path you have already used with the List() method, e.g. "myfolder/myfile.txt").

Be careful with curl_easy_reset(m_pCurlSession); since it clears the options you will set with curl_easy_setopt(). it must be only done once before Perform() and setting options.

In the progress callback, if filesTotalSize is different from zero you can use it otherwise, it's a single file so use another logic to compute the rate of progression. DO NOT use the same callback with this new method and the other existing ones since actually the progress structure is not reset and if you the existing methods (upload, download) you might compute an incorrect progress rate methods (and be careful of zero divisions :p). That's why I'm reluctant to manage the progress bar for DownloadWildcard. It will create many issues + the methods are not reentrant if they are used together (e.g. if we update the total file size in a new method then inside it we call another method can set it to zero to notify the progress bar that it is performing a single transfer = problems ! so it's better to use the new field filesTotalSize only for this new method. Otherwise, in the progress callback, you have also the parameter void *pOwner you can set it with SetProgressFnCallback and use it to get the total file size...)

I noticed that we can't disable progress callback after setting it. I will push a fix for that.

I hope this helps you.