McDermott-Group / servers

Public repo that stores LabRAD servers and other non-measurement related code
0 stars 2 forks source link

Slow file search in saving #93

Closed bgchristensen closed 7 years ago

bgchristensen commented 7 years ago

@patzinak @roboguy222 @amopremcak

Once a folder has, say, >500 files, the process to start saving a file takes a significant amount of time (>20s in some cases). The actually saving process and everything else is still fast.

My best guess is in _save_data in the Experiment class, that a search is done over all files to determine the next file name to be used. I didn't test which part exactly was taking the most time (I'll do that today I hope). But we should consider how to make this more efficient. The first option that comes to mind is to just look at the length of the of the directory (so how many files exist), and just append '_num2str(length_of_directory)' to the end of the file name. Do you guys have any other, more elegant, solutions?

Note: In order to get to such large files, I did increase the number of digits at the end of the file. I ideally wanted to save around 40,000 files, so this huge slowdown was kind of catastrophic.

roboguy222 commented 7 years ago

Switching to the new datachest should help this I think. Let's talk seriously about this in the next few days.

patzinak commented 7 years ago

@bgchristensen It should be better now. If you are going to save a few thousands files in one folder and and then add a few thousands files later to the same folder, it would take some time to create the first file but after that it would work fast. See the pictures below. The first one is saving 500 files with the old code and the second picture is adding 500 more files to the same folder. The improvement is obvious and so the problem with the very first file of a run.

Also, just a note, it is typically better instead of generating 40,000 small files, let's say of 1 kB size, to create 40 files that are of 1 MB size or so. This is because opening and reading a huge number of files is typically slower than reading a smaller number of files that contain the same data.

figure_500_e figure_500_new