giampaolo / pyftpdlib

Extremely fast and scalable Python FTP server library
MIT License
1.68k stars 262 forks source link

[Question]: About pre-fork model #641

Open Howar-sz opened 4 months ago

Howar-sz commented 4 months ago

Hi, I'm a pyftpdlib user, and I'm looking to enhance the performance of my FTP server implementation. I came across the pre-fork model in the tutorial (https://github.com/giampaolo/pyftpdlib/blob/master/docs/tutorial.rst#pre-fork), but I'm having difficulty grasping how worker processes acquire connections. I attempted to integrate this model into unix_daemon.py, but it didn't yield any significant performance improvements.

# 50k files, 64k size, 1 parallel
Total: 5 directories, 50012 files, 0 symlinks
New: 50012 files, 0 symlinks
3277063424 bytes transferred in 217 seconds (14.41M/s)
real    3m42.103s
user    1m25.694s
sys 0m15.360s

# 50k files, 64k size, 2 parallel
Total: 5 directories, 50012 files, 0 symlinks
New: 50012 files, 0 symlinks
3277128960 bytes transferred in 518 seconds (6.03M/s)
real    5m0.634s
user    1m33.642s
sys 0m18.385s

# 50k files, 64k size, 4 parallel
Total: 5 directories, 50012 files, 0 symlinks
New: 50012 files, 0 symlinks
3277260032 bytes transferred in 1123 seconds (2.78M/s)
real    5m0.588s
user    1m42.999s
sys 0m22.693s

# 50k files, 64k size, 8 parallel
Total: 5 directories, 50012 files, 0 symlinks
New: 50012 files, 0 symlinks
3277704304 bytes transferred in 1878 seconds (1.66M/s)
real    5m0.585s
user    1m26.395s
sys 0m19.545s

Look forward to hearing from you

giampaolo commented 2 months ago

I'm having difficulty grasping how worker processes acquire connections.

As far as I remember, the parent / master process "passes" every new connection to one of the workers, so this may make things slower compared to the 1 process async model. If this is true, you may have more luck changing your benchmark so that it downloads, say, 10 files of 1G each instead of 50k files of 64K each. But it's just a supposition.

Also, what are you using for your benchmarks? Is it only one client downloading the file serially or there's multiple clients in parallel?

Note: I've never conducted benchmarks for the pre-fork model, so you're a pioneer in this sense. :)

giampaolo commented 2 months ago

PS: I see you're from Shenzhen. My wife is from there. :-)

Howar-sz commented 2 months ago

As far as I remember, the parent / master process "passes" every new connection to one of the workers, so this may make things slower compared to the 1 process async model. If this is true, you may have more luck changing your benchmark so that it downloads, say, 10 files of 1G each instead of 50k files of 64K each. But it's just a supposition.

Is "passes" means any new ftp connection need to allocated by parent/master process? If subprocess was busied, parent process will waiting?

Also, what are you using for your benchmarks? Is it only one client downloading the file serially or there's multiple clients in parallel?

It's an uploads test. I use lftp with -e "mirror -R -c -P <parallel>" arguments as my benchmark tool. I think lftp is multiple ftp connections in parallel if parallel argument greater than one