bpcurse / nextcloud-userexport

PHP script to export and filter lists (users, groups and groupfolders) using different Nextcloud APIs
The Unlicense
11 stars 3 forks source link

The curl_multi can break down nextcloud or the db connections #65

Open rabser opened 2 years ago

rabser commented 2 years ago

I experienced bad runs (many users empty lines) and a general DOS on nextcloud server when using this wonderful tool (480 users registered), as curl_multi is too aggressive. I solved it by splitting the array of users in chunks simply wrapping the “curl_multi” user data attributes using array_chunk on the users list read from $SESSION. Chunking in blocks of 50 users I managed to get all the info about them without loosing nextcloud or the backend database…

If my patch can be of any interest, i can share.

bpcurse commented 2 years ago

@rabser Great, I'm happy to hear you like it :)

Sharing the patch would be very appreciated, indeed. This also sounds like a plausible explanation for another issue.

rabser commented 2 years ago

I'm sorry but I never understood how to use the pull requests (my fault), so I paste here the patch, as it's very simple:

In functions.php substitute starting from line 314

  // Initialize cURL multi handle for parallel requests
  $mh = curl_multi_init();

  // Iterate through userlist
  foreach($_SESSION['userlist'] as $key => $user_id) {

with this block

  $userids = array_chunk($_SESSION['userlist'],50);
  $chunks = count($userids);

  for ( $chunk=0; $chunk<$chunks; $chunk++) {
        // just to let you know what's happening...
        error_log("Working on chunk $chunk"); 

        // Initialize cURL multi handle for parallel requests
        $mh = curl_multi_init();
        // Iterate through userlist
        foreach($userids[$chunk] as $key => $user_id) {

then add a close brace after

curl_multi_close($mh);

to

curl_multi_close($mh);
}

which in original file is in line 350.

Obviously a better work should having a config item for chunking size, for a simpler tuning. A chunk-size of 1, would be equal to not having curl_multi at all ...

HTH

bpcurse commented 2 years ago

Obviously a better work should having a config item for chunking size, for a simpler tuning. A chunk-size of 1, would be equal to not having curl_multi at all ...

I had the same thought. A config option will be made available in the next version.

Thank you for contributing to this project :+1:, no need to apologize for the pull request. I had problems getting started with pull requests myself.

Just in case you would like to practice (as it is quite easy in Githubs Web UI):

rabser commented 2 years ago

Great! i didn’t know that i could simply edit directly a file to get a pull request (I manage several gitlab self hosted servers, so that i could edit a file was known to me). Next time iI’ll try !

bpcurse commented 2 years ago

@rabser A new version containing your chunking proposal and a debug log (both are options in config.php) has been published in the debug branch. May I ask you to test this version as your setup is known to react to these changes?

Here is a howto:

rabser commented 2 years ago

Hi, tests were successful. Starting with chunks to false, I got many "!ERROR: Received empty user id" lines in the debug.log Raising up (eg. 40, 50) the errors gone away, as it should be.

Just an hints about the debug.log: if you think to bring that feature to production, you should think that you should left to the admin (eg. in config.php) the choice on were debug.log will be saved or web dir should be writable by the webserver and this is normally not a best practice (nor accepted by providers).

HTH

bpcurse commented 2 years ago

Thanks a lot for testing! Glad to hear it works in "real life".

For production I am planning to show an error message after user data has been fetched and at least one transfer has failed, including a hint to alter the user_chunk_size config setting. It won't keep the user from working with the potentially incomplete data, if necessary.

This error message could offer a "download log file" button. Log data would be kept in a $_SESSION var until download is actively chosen. I hope the use of the same system as user data csv file downloads with random temp path/file names and auto deletion after download should be secure enough? See build_csv_file, download_fileand delete_temp_folder functions in functions.php.

What do you think?

rabser commented 2 years ago

I'm not so used to save data into the $_SESSION, but yes, I can't see any problem in this strategy.