Open allejo opened 3 years ago
Have you identified if the slowness is on reading file or sorting? If it is while reading each file we could try to read each file only once, to match the magic and to read the time. Actually we do a fread, rewind, fseek and fread, I think we could do only an fread merging the operation.
If it is while sorting at the best sort can do n*log(n) but if the results are only max 100 we could do that operation in 100n that for very big big n can be lower :(
Basic optimization rules apply. For any operation that is non-performant, you can
If scanning 13,000 files is too costly (one might imagine so), you can perform a time/memory exchange and store the results, either in memory or create a database or sorts. It is the nature of this problem that the information of each file doesn't change, so really only new files need to be scanned. This is a form of "do it less often" optimization.
Let's suppose you had a cached replay list, replay_list.txt
. While not super speedy, I can imagine that iterating over the directory (BTW, C++17 std::filesystem
helps here) and only keeping those files newer than the last modification time of the cache would be more performant.
The challenge is one of performance. How quickly after a new replay is created do you want it to be seen in response to /replay list
? That's how often you need to re-evaluate this cache. Or, more precisely, how often you need to check for new files.
FWIW, you could use std::async
to kick off this indexing task and to periodically update at whatever interval you need to monitor.
How long does it take bzfs to scan through those 13,000 replay files?
On server startup, it could lookup the files and cache the results. Another possible solution or remedy is to simply store the results in a simple database and have a plug-in lookup the results, as the API allows overriding default slash commands.
Doing some very basic/stupid profiling on where the slow down is, I've added some calls to bz_getCurrentTime
and I'm getting the following information:
sendFileList before getFileList(): 16.0285
sendFileList after getFileList(): 115.735
sendFileList after sorting: 115.735
sendFileList after sendMessage loop: 115.735
Removing the call to loadFileTime
does not appear to make a difference. In retrospect, that does make sense since the file is already open in memory.
Sorting seems is pretty instant, so it appears to me that the biggest slow down is opening the files.
13,000 files will stress the OS I/O cache. 26,000 will be even worse... Starting an asynchronous cache of the required information at server start is a pretty easy fix
Some suggestion that I found is to use openat instead of open and pread instead of fseek and fread
How big of an issue would it be if you cannot do /replay load #101
?
ie: be able to select from an index beyond the limit of 100 ?
The approach I'm taking is twofold:
getFileList
as to avoid iterating over everything when doing a pattern searchAlso a small side question, doing a /replay list <query>
and then follow that up with a /replay play #n
, there is no way that index number you're passing would actually select the n-th replay of that search query right ?
It is the nature of this problem that the information of each file doesn't change, so really only new files need to be scanned.
Something to keep in mind is that most recordings are actively streamed to the disk. In this mode, the replay header is initially written with a recording duration of 0, and when the recording is stopped this duration is updated. So if we're able to read the recording while it's being actively written (I've not tested that), you would not want to cache information about recordings that have a duration of 0.
I've been looking into this further the last few days. When we're listing replays, we loop through all of them, open the file, and run two fread calls. The first fread is to read the magic string to identify if it is a BZFlag replay file. Then we seek ahead a bit and read the replay duration, which we show in the list output. So far it looks like just the act of running a fread for the magic string on thousands of files takes a good chunk of time.
I made two example directories of replays, one with 3000 files of 16MB each (some tournament replay I had) and 15000 replay files I created with a basic program that just writes out the replay header and a world database dumped from an empty world file (502 bytes per replay).
I compiled this minimal program:
#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <iostream>
typedef uint32_t u32;
int main() {
std::string RecordDir = "recordings/";
DIR* dir = opendir(RecordDir.c_str());
if (dir == NULL)
return 1;
struct dirent *de;
unsigned long filesRead = 0;
while ((de = readdir(dir)) != NULL)
{
std::string name = RecordDir;
name += de->d_name;
FILE* file = fopen(name.c_str(), "rb");
u32 magic;
char buffer[sizeof(magic)];
if (fread(buffer, sizeof(magic), 1, file) != 1)
{
fclose(file);
continue;
}
filesRead++;
fclose(file);
}
std::cout << "Total files read: " << filesRead << std::endl;
return 0;
}
I ran the test on a Debian system with an Intel Core 2 Duo E8400 3GHz, 6GB RAM, and 80GB SATA hard drive. With 3000 files the program took 40 seconds and the 15000 files took 94 seconds. Before each run I executed as root sync; echo 3 > /proc/sys/vm/drop_caches
to flush the caches.
Things to think about:
/replay list
and /record list
, so maybe we should disallow that to prevent live game servers from freezing in the case of lots of replays./replay list -t
).I have not tested it, and cannot, but maybe this is faster:
#include <iostream>
#include <dirent.h>
#include <fcntl.h>
#include <unistd.h>
typedef uint32_t u32;
int main() {
std::string RecordDir = "recordings/";
DIR* dir = opendir(RecordDir.c_str());
if (dir == NULL)
return 1;
int dfd = open(RecordDir.c_str(), O_RDONLY);
struct dirent *de;
unsigned long filesRead = 0;
while ((de = readdir(dir)) != NULL)
{
int file = openat(dfd, de->d_name, O_RDONLY);
u32 magic;
char buffer[sizeof(magic)];
if (read(file, buffer, sizeof(magic)) != sizeof(magic))
{
close(file);
continue;
}
filesRead++;
close(file);
}
std::cout << "Total files read: " << filesRead << std::endl;
return 0;
}
I just tested that variant and it takes basically the same amount of time.
In case this is helpful to anyone, this is the output from two consecutive runs of my original test program, with the first time being with caches cleared.
blast007@debian:~$ /usr/bin/time ./filetest
Total files read: 3001
0.02user 0.13system 0:41.01elapsed 0%CPU (0avgtext+0avgdata 3488maxresident)k
102104inputs+0outputs (25major+137minor)pagefaults 0swaps
blast007@debian:~$ /usr/bin/time ./filetest
Total files read: 3001
0.01user 0.03system 0:00.04elapsed 93%CPU (0avgtext+0avgdata 3468maxresident)k
0inputs+0outputs (0major+148minor)pagefaults 0swaps
I'll also attach my 15000 generated replay files. They're enough for listing replays, but will error out if you try to actually load one since it's missing data after the world database. They even have randomized replay durations, so sorting by duration is an option for testing performance.
Do we have a target for how fast we should be able to read the headers of 15000 files? How fast is fast enough?
Using modern C++,
#include <filesystem>
#include <fstream>
#include <iostream>
#include <chrono>
void test(std::filesystem::path const& path)
{
for (auto const& dir_ent : std::filesystem::directory_iterator(path))
{
std::ifstream f(dir_ent.path(), std::ios_base::in|std::ios::binary);
if (!f.good()) { std::cout << "problem opening " << dir_ent.path() << std::endl; return; }
uint32_t magic;
auto expected = sizeof(magic);
f.read(reinterpret_cast<char*>(&magic), expected);
auto nRead = f.tellg();
if (nRead != expected)
{
std::cout << "problem reading " << dir_ent.path()
<< ", read " << nRead << " of " << expected << " bytes."
<< std::endl;
}
}
}
int main()
{
auto path = std::filesystem::path("recordings");
auto start = std::chrono::steady_clock::now();
test(path);
auto stop = std::chrono::steady_clock::now();
std::cout << "Elapsed: "
<< std::chrono::duration_cast<std::chrono::duration<double>>(stop - start).count()
<< " seconds"
<< std::endl;
return 0;
}
iMac 3.3GHz Intel Core i5, 32GB 2667 MHz DDR4, 2TB Flash storage
Ran your files in 2 seconds
jwmelto@Goliath bzflag % ./a.out
Elapsed: 2.23998 seconds
jwmelto@Goliath bzflag % ./a.out
Elapsed: 0.265397 seconds
Does that help?
Basically the same on my test system (with the 3001 x 16MB replays):
blast007@debian:~$ /usr/bin/time ./filetest3
Elapsed: 40.2098 seconds
0.04user 0.14system 0:40.37elapsed 0%CPU (0avgtext+0avgdata 3644maxresident)k
102456inputs+0outputs (26major+140minor)pagefaults 0swaps
For the 15000 files, here's what it's like on that Core 2 Duo system with the hard drive:
blast007@debian:~$ /usr/bin/time ./filetest3
Elapsed: 93.2741 seconds
0.13user 0.66system 1:33.44elapsed 0%CPU (0avgtext+0avgdata 3504maxresident)k
133448inputs+0outputs (26major+138minor)pagefaults 0swaps
And on a $6/month Digital Ocean droplet (1 shared CPU core, 25GB SSD, 1GB RAM):
blast007@xs:~$ /usr/bin/time ./filetest3
Elapsed: 4.91171 seconds
0.18user 0.69system 0:04.92elapsed 17%CPU (0avgtext+0avgdata 3564maxresident)k
132480inputs+0outputs (9major+145minor)pagefaults 0swaps
looks like your files are on a network drive? That's going to be painfully slow.
No, there are no network drives involved in any of this.
So I'm kinda stumped on how to proceed with fixing this situation. On my Leagues United replay server, I have over 13,000 replays and running
/replay list
is extremely inefficient and leads to server freezes.https://github.com/BZFlag-Dev/bzflag/blob/3be72c3aeba3f5f5e118a8279fb885333dc2b5b7/src/bzfs/RecordReplay.cxx#L865-L942
What I've gathered from reading through this code is: opening up 13,000 files and extracting the match date isn't the most efficient solution. Even though
sendFileList
will only output 100 results to the user as private messages, the server is still processing all 13,000 replays.My thought for an immediate fix (2.4) is to change this functionality to use the filesystem "last modified" date for the sorting by time even though this can sometimes be inaccurate.(This is not the cause of the slowdown so doing this would prove useless)For 2.6, I would think having a replay server doing some sort of caching on boot to optimize these listings would be ideal.