facebook / wdt

Warp speed Data Transfer (WDT) is an embeddedable library (and command line tool) aiming to transfer data between 2 systems as fast as possible over multiple TCP paths.
https://www.facebook.com/WdtOpenSource
Other
2.86k stars 391 forks source link

Waiting for PROCESS_WAIT_CMD state takes long time #139

Closed zxpan closed 7 years ago

zxpan commented 7 years ago

A question here, I had a 120MB single file, and I used WDT to transfer it between two physical servers. This took about 30 seconds for sender and receiver finished. The puzzle is that I observed the whole file already transferred to the disk, but the sender side went into entered PROCESS_WAIT_CMD state for literally 25 seconds. What the thing holds one of thread takes so long to get out this state? This behavior sound defend the purpose of fast data transfer.

The log snippets: I1214 16:47:11.470233 23157 DirectorySourceQueue.cpp:201] wdt> Using list of file info. Number of files 1 I1214 16:47:11.504415 23158 SenderThread.cpp:85] wdt> Thread[0, port: 4430] Connection took 1 attempt(s) and 0.0341567 seconds. port 4430 I1214 16:47:11.505105 23209 SenderThread.cpp:85] wdt> Thread[1, port: 4431] Connection took 1 attempt(s) and 0.0326994 seconds. port 4431 I1214 16:47:11.505519 23216 SenderThread.cpp:85] wdt> Thread[3, port: 4433] Connection took 1 attempt(s) and 0.0329971 seconds. port 4433 I1214 16:47:11.505672 23220 SenderThread.cpp:85] wdt> Thread[4, port: 4434] Connection took 1 attempt(s) and 0.0330901 seconds. port 4434 I1214 16:47:11.507572 23211 SenderThread.cpp:85] wdt> Thread[2, port: 4432] Connection took 1 attempt(s) and 0.0351163 seconds. port 4432 I1214 16:47:11.507681 23224 SenderThread.cpp:85] wdt> Thread[5, port: 4435] Connection took 1 attempt(s) and 0.0350508 seconds. port 4435 I1214 16:47:11.507918 23229 SenderThread.cpp:85] wdt> Thread[6, port: 4436] Connection took 1 attempt(s) and 0.035202 seconds. port 4436 I1214 16:47:11.508317 23230 SenderThread.cpp:85] wdt> Thread[7, port: 4437] Connection took 1 attempt(s) and 0.035554 seconds. port 4437 I1214 16:47:14.219935 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:15.220541 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:16.220765 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:17.221560 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:18.221844 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:19.222250 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:20.222795 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:21.223304 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:22.223743 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:23.224333 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:24.225175 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:25.225862 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:26.226440 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:27.226626 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:28.227392 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:29.227818 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:30.228229 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:31.228562 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:32.229580 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:33.229594 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:34.230316 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:34.685650 23224 SenderThread.cpp:712] wdt> Thread[5, port: 4435] entered PROCESS_WAIT_CMD state I1214 16:47:35.230864 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:35.686082 23224 SenderThread.cpp:712] wdt> Thread[5, port: 4435] entered PROCESS_WAIT_CMD state I1214 16:47:36.231479 23230 SenderThread.cpp:712] wdt> Thread[7, port: 4437] entered PROCESS_WAIT_CMD state I1214 16:47:36.455055 23209 SenderThread.cpp:922] wdt> Thread[1, port: 4431] Port 4431 done. Transfer status = OK. Number of blocks transferred = 1. Data Mbytes = 16. Header Kbytes = 0.334961 (0.00204444% overhead). Total bytes = 16777559. Wasted bytes due to failure = 0 (0% overhead). Encryption type = none. Total throughput = 0.640447 Mbytes/sec:

zxpan commented 7 years ago

After I turn on "-v 3" option, I see a lot going on like I1214 22:00:30.945255 35108 FileByteSource.cpp:156] wdt> Size 262144 need align 1 physicalRead 262144 offset 50331648 seepPos 58982400 offsetRemainder 0 bytesRead 8912896

ldemailly commented 7 years ago

what is the receiver side logs saying ? can you put the full logs of both side ?

can you say all the options you are using (or any option different than default) ?

it's possible that if you have a really slow device and large buffers it would take a long time to flush

if you are using disk and not flash use -option_type disk

zxpan commented 7 years ago

The options I am using right now: 1) receiver /usr/local/bin/wdt --run_as_daemon=true --minloglevel=0 --start_port 4430 -encryption_type none --transfer_id=13937 -num_ports 8 -progress_report_interval_millis=0 -overwrite=true -max_accept_retries=0 -read_timeout_millis 8000 -write_timeout_millis 8000 -v 3 2) sender time /usr/local/bin/wdt --directory pwd --follow_symlinks --destination dev5.alphonso.tv --start_port 4430 --ipv6=false -encryption_type none -progress_report_interval_millis 0 --transfer_id=13937 --num_ports 8 -odirect_reads=true -read_timeout_millis 8000 -write_timeout_millis 8000 -v 3

Right now the transfer takes about 20+ seconds. I am using flash drive.

The transfer is between Verizon FIOS (sender) to the Cloud data center (receiver).

The puzzle for me is that It seems the single file is already copied (using linux ls -ls to watch the file on receiver side), while the sender keeps doing something.

Part of sender log: I1214 22:55:55.206275 37308 Throttler.cpp:229] wdt> There is no avg rate limit I1214 22:55:55.230696 37312 WdtSocket.cpp:290] wdt> socket io for 262144 bytes took 3 retries I1214 22:55:55.230711 37312 Reporting.cpp:53] wdt> Local ErrorCode now 0 from 0 I1214 22:55:55.230715 37312 Reporting.cpp:53] wdt> Local ErrorCode now 0 from 0 I1214 22:55:55.230723 37312 SenderThread.cpp:219] wdt> Thread[3, port: 4433] entered SEND_BLOCKS state I1214 22:55:55.230729 37312 SenderThread.cpp:400] wdt> Thread[3, port: 4433] entered SEND_DONE_CMD state I1214 22:55:55.230736 37312 SenderThread.cpp:418] wdt> Thread[3, port: 4433] Wrote done cmd on 6 waiting for reply... I1214 22:55:55.230741 37312 SenderThread.cpp:619] wdt> Thread[3, port: 4433] entered READ_RECEIVER_CMD state I1214 22:55:55.240687 37304 WdtSocket.cpp:290] wdt> socket io for 1 bytes took 8 retries I1214 22:55:55.240701 37304 SenderThread.cpp:712] wdt> Thread[0, port: 4430] entered PROCESS_WAIT_CMD state I1214 22:55:55.240706 37304 SenderThread.cpp:716] wdt> Thread[0, port: 4430] received WAIT_CMD, port 4430 I1214 22:55:55.240710 37304 SenderThread.cpp:619] wdt> Thread[0, port: 4430] entered READ_RECEIVER_CMD state I1214 22:55:55.272325 37317 WdtSocket.cpp:290] wdt> socket io for 262144 bytes took 2 retries I1214 22:55:55.273110 37317 FileByteSource.cpp:156] wdt> Size 262144 need align 1 physicalRead 262144 offset 33554432 seepPos 49020928 offsetRemainder 0 bytesRead 15728640 I1214 22:55:55.273123 37317 Throttler.cpp:229] wdt> There is no avg rate limit I1214 22:55:55.404325 37308 WdtSocket.cpp:290] wdt> socket io for 262144 bytes took 2 retries I1214 22:55:55.405266 37308 FileByteSource.cpp:156] wdt> Size 262144 need align 1 physicalRead 262144 offset 50331648 seepPos 66322432 offsetRemainder 0 bytesRead 16252928 I1214 22:55:55.405277 37308 Throttler.cpp:229] wdt> There is no avg rate limit I1214 22:55:55.604316 37308 WdtSocket.cpp:290] wdt> socket io for 262144 bytes took 2 retries I1214 22:55:55.620164 37308 FileByteSource.cpp:156] wdt> Size 262144 need align 1 physicalRead 262144 offset 50331648 seepPos 66584576 offsetRemainder 0 bytesRead 16515072 I1214 22:55:55.620177 37308 Throttler.cpp:229] wdt> There is no avg rate limit I1214 22:55:55.630949 37314 WdtSocket.cpp:290] wdt> socket io for 1 bytes took 11 retries I1214 22:55:55.630962 37314 SenderThread.cpp:712] wdt> Thread[5, port: 4435] entered PROCESS_WAIT_CMD state I1214 22:55:55.630966 37314 SenderThread.cpp:716] wdt> Thread[5, port: 4435] received WAIT_CMD, port 4435 I1214 22:55:55.630970 37314 SenderThread.cpp:619] wdt> Thread[5, port: 4435] entered READ_RECEIVER_CMD state I1214 22:55:55.672333 37317 WdtSocket.cpp:290] wdt> socket io for 262144 bytes took 3 retries I1214 22:55:55.689213 37317 FileByteSource.cpp:156] wdt> Size 262144 need align 1 physicalRead 262144 offset 33554432 seepPos 49283072 offsetRemainder 0 bytesRead 15990784

The Receiver side log: I1214 22:55:56.199162 6150 FileWriter.cpp:84] wdt> Successfully written 1448 bytes to fd 30 for file AB.NBC.1481642641879-1481642699249.adblock.clip.ts I1214 22:55:56.199170 6150 ReceiverThread.cpp:66] wdt> readAtMost target 262144 I1214 22:55:56.202679 6145 ReceiverThread.cpp:78] wdt> readAtMost 1448 / 223680 from 22 I1214 22:55:56.202689 6145 Throttler.cpp:229] wdt> There is no avg rate limit I1214 22:55:56.202700 6145 FileWriter.cpp:84] wdt> Successfully written 1448 bytes to fd 32 for file AB.NBC.1481642641879-1481642699249.adblock.clip.ts I1214 22:55:56.202709 6145 ReceiverThread.cpp:66] wdt> readAtMost target 222232 I1214 22:55:56.204222 6145 ReceiverThread.cpp:78] wdt> readAtMost 1448 / 222232 from 22 I1214 22:55:56.204234 6145 Throttler.cpp:229] wdt> There is no avg rate limit I1214 22:55:56.204247 6145 FileWriter.cpp:84] wdt> Successfully written 1448 bytes to fd 32 for file AB.NBC.1481642641879-1481642699249.adblock.clip.ts I1214 22:55:56.204257 6145 ReceiverThread.cpp:66] wdt> readAtMost target 220784 I1214 22:55:56.226900 6150 ReceiverThread.cpp:78] wdt> readAtMost 1448 / 985402 from 20 I1214 22:55:56.226915 6150 Throttler.cpp:229] wdt> There is no avg rate limit I1214 22:55:56.226927 6150 FileWriter.cpp:84] wdt> Successfully written 1448 bytes to fd 30 for file AB.NBC.1481642641879-1481642699249.adblock.clip.ts I1214 22:55:56.226936 6150 ReceiverThread.cpp:66] wdt> readAtMost target 262144 I1214 22:55:56.227066 6150 ReceiverThread.cpp:78] wdt> readAtMost 2896 / 983954 from 20 I1214 22:55:56.227077 6150 Throttler.cpp:229] wdt> There is no avg rate limit I1214 22:55:56.227088 6150 FileWriter.cpp:84] wdt> Successfully written 2896 bytes to fd 30 for file AB.NBC.1481642641879-1481642699249.adblock.clip.ts I1214 22:55:56.227097 6150 ReceiverThread.cpp:66] wdt> readAtMost target 262144 I1214 22:55:56.227454 6150 ReceiverThread.cpp:78] wdt> readAtMost 2896 / 981058 from 20 I1214 22:55:56.227465 6150 Throttler.cpp:229] wdt> There is no avg rate limit I1214 22:55:56.227478 6150 FileWriter.cpp:84] wdt> Successfully written 2896 bytes to fd 30 for file AB.NBC.1481642641879-1481642699249.adblock.clip.ts I1214 22:55:56.227486 6150 ReceiverThread.cpp:66] wdt> readAtMost target 262144

zxpan commented 7 years ago

Perhaps the file is not fully written even I see the file in local disk. if so you can ignore this issue.

ldemailly commented 7 years ago

By default we fallocate the total size of each file when we open them and then fill up the blocks as they come in. You can check running iostat for instance what is happening during the transfer

Run without -v and let's see the progress reporting on both side and the end of the logs

Throughput seems very low but maybe your network is very slow / limited ?

ldemailly commented 7 years ago

going to close this but reopen if you still have issues or questions