Closed Mellvik closed 1 year ago
tcpdump - ftp dir command - DEBUG_TCPPKT disabled:
13:11:37.852169 IP (tos 0x0, ttl 64, id 2455, offset 0, flags [none], proto TCP (6), length 46)
10.0.2.16.1104 > 10.0.2.15.ftp: Flags [P.], cksum 0x9751 (correct), seq 62:68, ack 251, win 4380, length 6: FTP, length: 6
PASV
13:11:37.853829 IP (tos 0x0, ttl 64, id 162, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.15.ftp > 10.0.2.16.1104: Flags [.], cksum 0x4801 (correct), ack 68, win 4374, length 0
13:11:37.875566 IP (tos 0x0, ttl 64, id 163, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.15.ftp > 10.0.2.16.1104: Flags [.], cksum 0x47fb (correct), ack 68, win 4380, length 0
13:11:37.890760 IP (tos 0x0, ttl 64, id 164, offset 0, flags [none], proto TCP (6), length 83)
10.0.2.15.ftp > 10.0.2.16.1104: Flags [P.], cksum 0x6c0e (correct), seq 251:294, ack 68, win 4380, length 43: FTP, length: 43
227 Entering Passive Mode (10,0,2,15,4,2)
13:11:37.893444 IP (tos 0x0, ttl 64, id 2456, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1104 > 10.0.2.15.ftp: Flags [.], cksum 0x47fb (correct), ack 294, win 4337, length 0
13:11:37.913075 IP (tos 0x0, ttl 64, id 2457, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1104 > 10.0.2.15.ftp: Flags [.], cksum 0x47d0 (correct), ack 294, win 4380, length 0
13:11:37.933820 IP (tos 0x0, ttl 64, id 2458, offset 0, flags [none], proto TCP (6), length 44)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [S], cksum 0x3361 (correct), seq 585568336, win 4380, options [mss 1460], length 0
13:11:37.935899 IP (tos 0x0, ttl 64, id 165, offset 0, flags [none], proto TCP (6), length 44)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [S.], cksum 0x2219 (correct), seq 586477122, ack 585568337, win 4380, options [mss 1460], length 0
13:11:37.937288 IP (tos 0x0, ttl 64, id 2459, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x39d6 (correct), ack 1, win 4380, length 0
13:11:37.940052 IP (tos 0x0, ttl 64, id 2460, offset 0, flags [none], proto TCP (6), length 46)
10.0.2.16.1104 > 10.0.2.15.ftp: Flags [P.], cksum 0x9b1a (correct), seq 68:74, ack 294, win 4380, length 6: FTP, length: 6
LIST
13:11:37.950671 IP (tos 0x0, ttl 64, id 166, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.15.ftp > 10.0.2.16.1104: Flags [.], cksum 0x47d0 (correct), ack 74, win 4374, length 0
13:11:37.965631 IP (tos 0x0, ttl 64, id 167, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.15.ftp > 10.0.2.16.1104: Flags [.], cksum 0x47ca (correct), ack 74, win 4380, length 0
13:11:37.968660 IP (tos 0x0, ttl 64, id 168, offset 0, flags [none], proto TCP (6), length 93)
10.0.2.15.ftp > 10.0.2.16.1104: Flags [P.], cksum 0x9e05 (correct), seq 294:347, ack 74, win 4380, length 53: FTP, length: 53
150 Opening ASCII mode data connection for /bin/ls.
13:11:37.971910 IP (tos 0x0, ttl 64, id 2461, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1104 > 10.0.2.15.ftp: Flags [.], cksum 0x47ca (correct), ack 347, win 4327, length 0
13:11:37.993903 IP (tos 0x0, ttl 64, id 2462, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1104 > 10.0.2.15.ftp: Flags [.], cksum 0x4795 (correct), ack 347, win 4380, length 0
13:11:38.126300 IP (tos 0x0, ttl 64, id 169, offset 0, flags [none], proto TCP (6), length 102)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xfc4d (correct), seq 1:63, ack 1, win 4380, length 62
13:11:38.129593 IP (tos 0x0, ttl 64, id 2463, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x39d6 (correct), ack 63, win 4318, length 0
13:11:38.131030 IP (tos 0x0, ttl 64, id 170, offset 0, flags [none], proto TCP (6), length 107)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xf72e (correct), seq 63:130, ack 1, win 4380, length 67
13:11:38.135337 IP (tos 0x0, ttl 64, id 2464, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x39d6 (correct), ack 130, win 4251, length 0
13:11:38.136792 IP (tos 0x0, ttl 64, id 171, offset 0, flags [none], proto TCP (6), length 102)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x21d5 (correct), seq 130:192, ack 1, win 4380, length 62
13:11:38.138026 IP (tos 0x0, ttl 64, id 2465, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3955 (correct), ack 130, win 4380, length 0
13:11:38.143094 IP (tos 0x0, ttl 64, id 172, offset 0, flags [none], proto TCP (6), length 104)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xa62e (correct), seq 192:256, ack 1, win 4380, length 64
13:11:38.147204 IP (tos 0x0, ttl 64, id 173, offset 0, flags [none], proto TCP (6), length 104)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xa9ec (correct), seq 256:320, ack 1, win 4380, length 64
13:11:38.151708 IP (tos 0x0, ttl 64, id 174, offset 0, flags [none], proto TCP (6), length 104)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xa0ac (correct), seq 320:384, ack 1, win 4380, length 64
13:11:38.156296 IP (tos 0x0, ttl 64, id 175, offset 0, flags [none], proto TCP (6), length 104)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xb774 (correct), seq 384:448, ack 1, win 4380, length 64
13:11:38.160828 IP (tos 0x0, ttl 64, id 176, offset 0, flags [none], proto TCP (6), length 102)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x1a9b (correct), seq 448:510, ack 1, win 4380, length 62
13:11:38.164266 IP (tos 0x0, ttl 64, id 2466, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3955 (correct), ack 192, win 4318, length 0
13:11:38.165644 IP (tos 0x0, ttl 64, id 177, offset 0, flags [none], proto TCP (6), length 107)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xb973 (correct), seq 510:577, ack 1, win 4380, length 67
13:11:38.168172 IP (tos 0x0, ttl 64, id 2467, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3955 (correct), ack 256, win 4254, length 0
13:11:38.170646 IP (tos 0x0, ttl 64, id 178, offset 0, flags [none], proto TCP (6), length 101)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xfc86 (correct), seq 577:638, ack 1, win 4380, length 61
13:11:38.172082 IP (tos 0x0, ttl 64, id 2468, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3955 (correct), ack 320, win 4190, length 0
13:11:38.173969 IP (tos 0x0, ttl 64, id 2469, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3897 (correct), ack 320, win 4380, length 0
13:11:38.177086 IP (tos 0x0, ttl 64, id 179, offset 0, flags [none], proto TCP (6), length 106)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x2cfe (correct), seq 638:704, ack 1, win 4380, length 66
13:11:38.181680 IP (tos 0x0, ttl 64, id 180, offset 0, flags [none], proto TCP (6), length 101)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x1208 (correct), seq 704:765, ack 1, win 4380, length 61
13:11:38.186304 IP (tos 0x0, ttl 64, id 181, offset 0, flags [none], proto TCP (6), length 101)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x06ce (correct), seq 765:826, ack 1, win 4380, length 61
13:11:38.190509 IP (tos 0x0, ttl 64, id 182, offset 0, flags [none], proto TCP (6), length 103)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xb21f (correct), seq 826:889, ack 1, win 4380, length 63
13:11:38.210963 IP (tos 0x0, ttl 64, id 2470, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3897 (correct), ack 384, win 4316, length 0
13:11:38.214783 IP (tos 0x0, ttl 64, id 2471, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3897 (correct), ack 448, win 4252, length 0
13:11:38.218516 IP (tos 0x0, ttl 64, id 2472, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3897 (correct), ack 510, win 4190, length 0
13:11:38.219786 IP (tos 0x0, ttl 64, id 183, offset 0, flags [none], proto TCP (6), length 103)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xade5 (correct), seq 889:952, ack 1, win 4380, length 63
13:11:38.253965 IP (tos 0x0, ttl 64, id 2473, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x37d9 (correct), ack 510, win 4380, length 0
13:11:38.257758 IP (tos 0x0, ttl 64, id 2474, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x37d9 (correct), ack 577, win 4313, length 0
13:11:38.261541 IP (tos 0x0, ttl 64, id 2475, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x37d9 (correct), ack 638, win 4252, length 0
13:11:38.265551 IP (tos 0x0, ttl 64, id 2476, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x37d9 (correct), ack 704, win 4186, length 0
13:11:38.267455 IP (tos 0x0, ttl 64, id 2477, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3717 (correct), ack 704, win 4380, length 0
13:11:38.268833 IP (tos 0x0, ttl 64, id 184, offset 0, flags [none], proto TCP (6), length 104)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x9b22 (correct), seq 952:1016, ack 1, win 4380, length 64
13:11:38.275467 IP (tos 0x0, ttl 64, id 185, offset 0, flags [none], proto TCP (6), length 102)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xff5f (correct), seq 1016:1078, ack 1, win 4380, length 62
13:11:38.281539 IP (tos 0x0, ttl 64, id 186, offset 0, flags [none], proto TCP (6), length 102)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xfc00 (correct), seq 1078:1140, ack 1, win 4380, length 62
13:11:38.285562 IP (tos 0x0, ttl 64, id 187, offset 0, flags [none], proto TCP (6), length 103)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x97df (correct), seq 1140:1203, ack 1, win 4380, length 63
13:11:38.290205 IP (tos 0x0, ttl 64, id 188, offset 0, flags [none], proto TCP (6), length 101)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xf608 (correct), seq 1203:1264, ack 1, win 4380, length 61
13:11:38.294891 IP (tos 0x0, ttl 64, id 189, offset 0, flags [none], proto TCP (6), length 102)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x0864 (correct), seq 1264:1326, ack 1, win 4380, length 62
13:11:38.299583 IP (tos 0x0, ttl 64, id 190, offset 0, flags [none], proto TCP (6), length 104)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x9cc0 (correct), seq 1326:1390, ack 1, win 4380, length 64
13:11:38.304404 IP (tos 0x0, ttl 64, id 191, offset 0, flags [none], proto TCP (6), length 107)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xc6f8 (correct), seq 1390:1457, ack 1, win 4380, length 67
13:11:38.305165 IP (tos 0x0, ttl 64, id 2478, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3717 (correct), ack 765, win 4319, length 0
13:11:38.308909 IP (tos 0x0, ttl 64, id 2479, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3717 (correct), ack 826, win 4258, length 0
13:11:38.308936 IP (tos 0x0, ttl 64, id 192, offset 0, flags [none], proto TCP (6), length 107)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xc5b5 (correct), seq 1457:1524, ack 1, win 4380, length 67
13:11:38.312810 IP (tos 0x0, ttl 64, id 2480, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3717 (correct), ack 889, win 4195, length 0
13:11:38.314671 IP (tos 0x0, ttl 64, id 2481, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x365e (correct), ack 889, win 4380, length 0
13:11:38.317033 IP (tos 0x0, ttl 64, id 193, offset 0, flags [none], proto TCP (6), length 101)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x18c8 (correct), seq 1524:1585, ack 1, win 4380, length 61
13:11:38.321799 IP (tos 0x0, ttl 64, id 194, offset 0, flags [none], proto TCP (6), length 105)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x4198 (correct), seq 1585:1650, ack 1, win 4380, length 65
13:11:38.326536 IP (tos 0x0, ttl 64, id 195, offset 0, flags [none], proto TCP (6), length 101)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xfb4a (correct), seq 1650:1711, ack 1, win 4380, length 61
13:11:38.330909 IP (tos 0x0, ttl 64, id 196, offset 0, flags [none], proto TCP (6), length 102)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x05ad (correct), seq 1711:1773, ack 1, win 4380, length 62
13:11:38.335800 IP (tos 0x0, ttl 64, id 197, offset 0, flags [none], proto TCP (6), length 106)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x1f85 (correct), seq 1773:1839, ack 1, win 4380, length 66
13:11:38.340575 IP (tos 0x0, ttl 64, id 198, offset 0, flags [none], proto TCP (6), length 103)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xa8ff (correct), seq 1839:1902, ack 1, win 4380, length 63
13:11:38.350797 IP (tos 0x0, ttl 64, id 2482, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x365e (correct), ack 952, win 4317, length 0
13:11:38.354635 IP (tos 0x0, ttl 64, id 2483, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x365e (correct), ack 1016, win 4253, length 0
13:11:38.358308 IP (tos 0x0, ttl 64, id 2484, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x365e (correct), ack 1078, win 4191, length 0
13:11:38.359469 IP (tos 0x0, ttl 64, id 199, offset 0, flags [none], proto TCP (6), length 105)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x365c (correct), seq 1902:1967, ack 1, win 4380, length 65
13:11:38.360342 IP (tos 0x0, ttl 64, id 2485, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x35a1 (correct), ack 1078, win 4380, length 0
13:11:38.397073 IP (tos 0x0, ttl 64, id 2486, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x35a1 (correct), ack 1140, win 4318, length 0
13:11:38.400921 IP (tos 0x0, ttl 64, id 2487, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x35a1 (correct), ack 1203, win 4255, length 0
13:11:38.404638 IP (tos 0x0, ttl 64, id 2488, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x35a1 (correct), ack 1264, win 4194, length 0
13:11:38.406507 IP (tos 0x0, ttl 64, id 2489, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x34e7 (correct), ack 1264, win 4380, length 0
13:11:38.415327 IP (tos 0x0, ttl 64, id 200, offset 0, flags [none], proto TCP (6), length 104)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xac34 (correct), seq 1967:2031, ack 1, win 4380, length 64
13:11:38.422026 IP (tos 0x0, ttl 64, id 201, offset 0, flags [none], proto TCP (6), length 103)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x7e4f (correct), seq 2031:2094, ack 1, win 4380, length 63
13:11:38.426266 IP (tos 0x0, ttl 64, id 202, offset 0, flags [none], proto TCP (6), length 107)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xb342 (correct), seq 2094:2161, ack 1, win 4380, length 67
13:11:38.431068 IP (tos 0x0, ttl 64, id 203, offset 0, flags [none], proto TCP (6), length 103)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xadd3 (correct), seq 2161:2224, ack 1, win 4380, length 63
13:11:38.435874 IP (tos 0x0, ttl 64, id 204, offset 0, flags [none], proto TCP (6), length 102)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x1b9c (correct), seq 2224:2286, ack 1, win 4380, length 62
13:11:38.442795 IP (tos 0x0, ttl 64, id 2490, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x34e7 (correct), ack 1326, win 4318, length 0
13:11:38.446581 IP (tos 0x0, ttl 64, id 2491, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x34e7 (correct), ack 1390, win 4254, length 0
13:11:38.450673 IP (tos 0x0, ttl 64, id 2492, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x34e7 (correct), ack 1457, win 4187, length 0
13:11:38.452569 IP (tos 0x0, ttl 64, id 2493, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3426 (correct), ack 1457, win 4380, length 0
13:11:38.490366 IP (tos 0x0, ttl 64, id 2494, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3426 (correct), ack 1524, win 4313, length 0
13:11:38.494121 IP (tos 0x0, ttl 64, id 2495, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3426 (correct), ack 1585, win 4252, length 0
13:11:38.498135 IP (tos 0x0, ttl 64, id 2496, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3426 (correct), ack 1650, win 4187, length 0
13:11:38.500028 IP (tos 0x0, ttl 64, id 2497, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3365 (correct), ack 1650, win 4380, length 0
13:11:38.534551 IP (tos 0x0, ttl 64, id 205, offset 0, flags [none], proto TCP (6), length 105)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x36f7 (correct), seq 2286:2351, ack 1, win 4380, length 65
13:11:38.537574 IP (tos 0x0, ttl 64, id 2498, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3365 (correct), ack 1711, win 4319, length 0
13:11:38.541434 IP (tos 0x0, ttl 64, id 2499, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3365 (correct), ack 1773, win 4257, length 0
13:11:38.545503 IP (tos 0x0, ttl 64, id 2500, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3365 (correct), ack 1839, win 4191, length 0
13:11:38.547332 IP (tos 0x0, ttl 64, id 2501, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x32a8 (correct), ack 1839, win 4380, length 0
13:11:38.556202 IP (tos 0x0, ttl 64, id 206, offset 0, flags [none], proto TCP (6), length 103)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xa8ff (correct), seq 1839:1902, ack 1, win 4380, length 63
13:11:38.575654 IP (tos 0x0, ttl 64, id 207, offset 0, flags [none], proto TCP (6), length 106)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x1f85 (correct), seq 1773:1839, ack 1, win 4380, length 66
13:11:38.584292 IP (tos 0x0, ttl 64, id 2502, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x32a8 (correct), ack 1902, win 4317, length 0
13:11:38.588216 IP (tos 0x0, ttl 64, id 2503, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x32a8 (correct), ack 1967, win 4252, length 0
13:11:38.591941 IP (tos 0x0, ttl 64, id 2504, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x32a8 (correct), ack 2031, win 4188, length 0
13:11:38.593842 IP (tos 0x0, ttl 64, id 2505, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x31e8 (correct), ack 2031, win 4380, length 0
13:11:38.593881 IP (tos 0x0, ttl 64, id 208, offset 0, flags [none], proto TCP (6), length 102)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x05ad (correct), seq 1711:1773, ack 1, win 4380, length 62
13:11:38.613355 IP (tos 0x0, ttl 64, id 209, offset 0, flags [none], proto TCP (6), length 101)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xfb4a (correct), seq 1650:1711, ack 1, win 4380, length 61
13:11:38.615821 IP (tos 0x0, ttl 64, id 210, offset 0, flags [none], proto TCP (6), length 103)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x8f10 (correct), seq 2351:2414, ack 1, win 4380, length 63
13:11:38.621020 IP (tos 0x0, ttl 64, id 211, offset 0, flags [none], proto TCP (6), length 103)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x8ce3 (correct), seq 2414:2477, ack 1, win 4380, length 63
13:11:38.629183 IP (tos 0x0, ttl 64, id 212, offset 0, flags [none], proto TCP (6), length 101)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xfc0c (correct), seq 2477:2538, ack 1, win 4380, length 61
13:11:38.631281 IP (tos 0x0, ttl 64, id 2506, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x31e8 (correct), ack 2094, win 4317, length 0
13:11:38.635381 IP (tos 0x0, ttl 64, id 2507, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x31e8 (correct), ack 2161, win 4250, length 0
13:11:38.635408 IP (tos 0x0, ttl 64, id 213, offset 0, flags [none], proto TCP (6), length 103)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xa45a (correct), seq 2538:2601, ack 1, win 4380, length 63
13:11:38.639231 IP (tos 0x0, ttl 64, id 2508, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x31e8 (correct), ack 2224, win 4187, length 0
13:11:38.641208 IP (tos 0x0, ttl 64, id 2509, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3127 (correct), ack 2224, win 4380, length 0
13:11:38.642673 IP (tos 0x0, ttl 64, id 214, offset 0, flags [none], proto TCP (6), length 102)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x0513 (correct), seq 2601:2663, ack 1, win 4380, length 62
13:11:38.647164 IP (tos 0x0, ttl 64, id 215, offset 0, flags [none], proto TCP (6), length 107)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xcb17 (correct), seq 2663:2730, ack 1, win 4380, length 67
13:11:38.651842 IP (tos 0x0, ttl 64, id 216, offset 0, flags [none], proto TCP (6), length 107)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xacd8 (correct), seq 2730:2797, ack 1, win 4380, length 67
13:11:38.672718 IP (tos 0x0, ttl 64, id 217, offset 0, flags [none], proto TCP (6), length 102)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x1b9c (correct), seq 2224:2286, ack 1, win 4380, length 62
13:11:38.675164 IP (tos 0x0, ttl 64, id 218, offset 0, flags [none], proto TCP (6), length 101)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x12cb (correct), seq 2797:2858, ack 1, win 4380, length 61
13:11:38.678734 IP (tos 0x0, ttl 64, id 2510, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3127 (correct), ack 2286, win 4318, length 0
13:11:38.681641 IP (tos 0x0, ttl 64, id 219, offset 0, flags [none], proto TCP (6), length 102)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x0d1d (correct), seq 2858:2920, ack 1, win 4380, length 62
13:11:38.682943 IP (tos 0x0, ttl 64, id 2511, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3127 (correct), ack 2351, win 4253, length 0
13:11:38.686411 IP (tos 0x0, ttl 64, id 2512, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x3127 (correct), ack 2351, win 4253, length 0
13:11:38.687532 IP (tos 0x0, ttl 64, id 220, offset 0, flags [none], proto TCP (6), length 104)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0xa162 (correct), seq 2920:2984, ack 1, win 4380, length 64
13:11:38.692013 IP (tos 0x0, ttl 64, id 221, offset 0, flags [none], proto TCP (6), length 102)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [P.], cksum 0x159a (correct), seq 2984:3046, ack 1, win 4380, length 62
13:11:38.696309 IP (tos 0x0, ttl 64, id 222, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [F.], cksum 0x2df0 (correct), seq 3046, ack 1, win 4380, length 0
13:11:38.698475 IP (tos 0x0, ttl 64, id 223, offset 0, flags [none], proto TCP (6), length 64)
10.0.2.15.ftp > 10.0.2.16.1104: Flags [P.], cksum 0x789f (correct), seq 347:371, ack 74, win 4380, length 24: FTP, length: 24
226 Transfer Complete.
13:11:38.710651 IP (tos 0x0, ttl 64, id 2513, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x30a8 (correct), ack 2351, win 4380, length 0
13:11:38.714008 IP (tos 0x0, ttl 64, id 2514, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x30a8 (correct), ack 2351, win 4380, length 0
13:11:38.717152 IP (tos 0x0, ttl 64, id 2515, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x30a8 (correct), ack 2351, win 4380, length 0
13:11:38.720313 IP (tos 0x0, ttl 64, id 2516, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x30a8 (correct), ack 2351, win 4380, length 0
13:11:38.723725 IP (tos 0x0, ttl 64, id 2517, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x30a8 (correct), ack 2414, win 4317, length 0
13:11:38.727530 IP (tos 0x0, ttl 64, id 2518, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x30a8 (correct), ack 2477, win 4254, length 0
13:11:38.731298 IP (tos 0x0, ttl 64, id 2519, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x30a8 (correct), ack 2538, win 4193, length 0
13:11:38.733179 IP (tos 0x0, ttl 64, id 2520, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x2fed (correct), ack 2538, win 4380, length 0
13:11:38.769581 IP (tos 0x0, ttl 64, id 2521, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x2fed (correct), ack 2601, win 4317, length 0
13:11:38.773419 IP (tos 0x0, ttl 64, id 2522, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x2fed (correct), ack 2663, win 4255, length 0
13:11:38.777449 IP (tos 0x0, ttl 64, id 2523, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x2fed (correct), ack 2730, win 4188, length 0
13:11:38.779342 IP (tos 0x0, ttl 64, id 2524, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x2f2d (correct), ack 2730, win 4380, length 0
13:11:38.816916 IP (tos 0x0, ttl 64, id 2525, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x2f2d (correct), ack 2797, win 4313, length 0
13:11:38.820450 IP (tos 0x0, ttl 64, id 2526, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x2f2d (correct), ack 2797, win 4313, length 0
13:11:38.824155 IP (tos 0x0, ttl 64, id 2527, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x2f2d (correct), ack 2858, win 4252, length 0
13:11:38.825825 IP (tos 0x0, ttl 64, id 2528, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x2ead (correct), ack 2858, win 4380, length 0
13:11:38.851835 IP (tos 0x0, ttl 64, id 2529, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x2ead (correct), ack 2920, win 4318, length 0
13:11:38.855662 IP (tos 0x0, ttl 64, id 2530, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x2ead (correct), ack 2984, win 4254, length 0
13:11:38.859375 IP (tos 0x0, ttl 64, id 2531, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x2ead (correct), ack 3046, win 4192, length 0
13:11:38.861310 IP (tos 0x0, ttl 64, id 2532, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x2df1 (correct), ack 3046, win 4380, length 0
13:11:38.896290 IP (tos 0x0, ttl 64, id 2533, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [.], cksum 0x2df0 (correct), ack 3047, win 4380, length 0
13:11:38.899182 IP (tos 0x0, ttl 64, id 2534, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1104 > 10.0.2.15.ftp: Flags [.], cksum 0x4795 (correct), ack 371, win 4356, length 0
13:11:38.913028 IP (tos 0x0, ttl 64, id 2535, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1104 > 10.0.2.15.ftp: Flags [.], cksum 0x477d (correct), ack 371, win 4380, length 0
13:11:38.921589 IP (tos 0x0, ttl 64, id 2536, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1106 > 10.0.2.15.1026: Flags [F.], cksum 0x2def (correct), seq 1, ack 3047, win 4380, length 0
13:11:38.923284 IP (tos 0x0, ttl 64, id 224, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.15.1026 > 10.0.2.16.1106: Flags [.], cksum 0x2def (correct), ack 2, win 4380, length 0
ktcp
out of memory condition, followup:
Documents previously reported behaviour (in this case there are no retransmits). The client is doing a mget *
in passive mode, consistently fails in the 6th file (the first 2 transfers are directory listings).
The good news - in PORT mode the same command works fine. There is a 2 second close (FIN) delay for small files, otherwise fine. We may need to look at that, but it may be a side effect of a previous bugfix.
ftp
mput *
exhibits exactly the same symptoms - in both PORT and passive modes, and this time it's obviously the client's ktcp
that runs out of memory, now being the sending party. This is not exactly what I expected - I need to ponder about this one to match up the symptoms with the res of the experience. Anyway - when this happens, the clients process hangs (interruptible) and the server process at the other end also hangs (soft).
When doing 'prompted' mput
- i.e. ftp
asks for confirmation per file to transfer and thus introducing a second or two delay per file, the sbrk-errors can be avoided - completely or almost. In some cases, ktcp
recovers - such as in this case:
mput miniterm.1 (y/n/q):
PORT: Listening on 10.0.2.16 port 1050
---> PORT 10,0,2,16,4,26
sys_brk(13) fail: brk d8e2 over by 850 bytes
sys_brk(13) fail: brk d8e6 over by 854 bytes
sys_brk(13) fail: brk d8e8 over by 856 bytes
sys_brk(13) fail: brk d668 over by 216 bytes
sys_brk(13) fail: brk d66a over by 218 bytes
200 PORT command successful.
---> STOR miniterm.1
local: miniterm.1, remote: miniterm.1
150 Opening BINARY data connection for 'miniterm.1'.
Accepted connection from 10.0.2.15 @ 20
226 Transfer Complete.
... but not in this case:
---> PORT 10,0,2,16,4,41
sys_brk(13) fail: brk e5e0 over by 4176 bytes
sys_brk(13) fail: brk dbe2 over by 1618 bytes
sys_brk(13) fail: brk d6e2 over by 338 bytes
sys_brk(13) fail: brk d6e4 over by 340 bytes
sys_brk(13) fail: brk d5a4 over by 20 bytes
425 Can't open data connection.
mput: 37 files transferred.
ftp>
# netstat
----- Received --------- ----- Sent -------------
TCP Packets 6260 TCP Packets 5116
TCP Dropped 4 TCP Retransmits 112
TCP Bad Checksum 0 TCP Retrans Memory 0
IP Packets 7598 IP Packets 5228
IP Bad Checksum 0 IP Bad Headers 0
ICMP Packets 0 ICMP Packets 0
SLIP Packets 0 SLIP Packets 0
ETH Packets 11632 ETH Packets 5245
ARP Reqs Sent 2 ARP Replies Rcvd 3
ARP Reqs Rcvd 4031 ARP Replies Sent 15
ARP Cache Adds 2
No State RTT lport raddress rport
-----------------------------------------------------
1 ESTABLISHED 1000ms 1093 0.0.0.0 2
2 ESTABLISHED 750ms 21 10.0.2.16 1343
3 LISTEN 1000ms 21 0.0.0.0 0
4 LISTEN 1000ms 23 0.0.0.0 0
# getsockname: adr 10.0.2.15, port 1094
227 Entering Passive Mode (10,0,2,15,4,70)
Accepted connection from 10.0.2.16/1345 on fd 5
getsockname: adr 10.0.2.15, port 1095
227 Entering Passive Mode (10,0,2,15,4,71)
Accepted connection from 10.0.2.16/1346 on fd 5
getsockname: adr 10.0.2.15, port 1096
227 Entering Passive Mode (10,0,2,15,4,72)
Accepted connection from 10.0.2.16/1347 on fd 5
getsockname: adr 10.0.2.15, port 1097
227 Entering Passive Mode (10,0,2,15,4,73)
Accepted connection from 10.0.2.16/1348 on fd 5
getsockname: adr 10.0.2.15, port 1098
227 Entering Passive Mode (10,0,2,15,4,74)
Accepted connection from 10.0.2.16/1349 on fd 5
getsockname: adr 10.0.2.15, port 1099
227 Entering Passive Mode (10,0,2,15,4,75)
Accepted connection from 10.0.2.16/1350 on fd 5
getsockname: adr 10.0.2.15, port 1100
227 Entering Passive Mode (10,0,2,15,4,76)
Accepted connection from 10.0.2.16/1351 on fd 5
getsockname: adr 10.0.2.15, port 1101
227 Entering Passive Mode (10,0,2,15,4,77)
Accepted connection from 10.0.2.16/1352 on fd 5
sys_brk(13) fail: brk d5f0 over by 76 bytes
sys_brk(13) fail: brk d5f4 over by 80 bytes
ktcp: Out of memory for retrans
getsockname: adr 10.0.2.15, port 1102
sys_brk(13) fail: brk d5f0 over by 76 bytes
sys_brk(13) fail: brk d5f4 over by 80 bytes
227 Entering Passive Mode (10,0,2,15,4,78)
sys_brk(13) fail: brk e5f0 over by 4172 bytes
sys_brk(13) fail: brk dbf2 over by 1614 bytes
sys_brk(13) fail: brk d6f2 over by 334 bytes
sys_brk(13) fail: brk d6f4 over by 336 bytes
sys_brk(13) fail: brk d5b4 over by 16 bytes
# netstat
----- Received --------- ----- Sent -------------
TCP Packets 6486 TCP Packets 5288
TCP Dropped 4 TCP Retransmits 112
TCP Bad Checksum 0 TCP Retrans Memory 0
IP Packets 7825 IP Packets 5400
IP Bad Checksum 0 IP Bad Headers 0
ICMP Packets 0 ICMP Packets 0
SLIP Packets 0 SLIP Packets 0
ETH Packets 11862 ETH Packets 5417
ARP Reqs Sent 2 ARP Replies Rcvd 3
ARP Reqs Rcvd 4034 ARP Replies Sent 15
ARP Cache Adds 2
No State RTT lport raddress rport
-----------------------------------------------------
1 ESTABLISHED 1000ms 1103 0.0.0.0 2
2 LISTEN 1000ms 1102 0.0.0.0 0
3 ESTABLISHED 187ms 21 10.0.2.16 1343
4 LISTEN 1000ms 21 0.0.0.0 0
5 LISTEN 1000ms 23 0.0.0.0 0
#
ftp> prompt
Interactive mode off.
ftp> mget *
227 Entering Passive Mode (10,0,2,15,4,71)
125 List started OK.
250 List completed.
200 Command OK.
227 Entering Passive Mode (10,0,2,15,4,72)
150 Opening BINARY data connection for ash.1 (37434 bytes).
remote: ash.1, local: ash.1
226 Transfer Complete.
227 Entering Passive Mode (10,0,2,15,4,73)
150 Opening BINARY data connection for basename.1 (758 bytes).
remote: basename.1, local: basename.1
226 Transfer Complete.
227 Entering Passive Mode (10,0,2,15,4,74)
150 Opening BINARY data connection for cal.1 (568 bytes).
remote: cal.1, local: cal.1
226 Transfer Complete.
227 Entering Passive Mode (10,0,2,15,4,75)
150 Opening BINARY data connection for chgrp.1 (583 bytes).
remote: chgrp.1, local: chgrp.1
226 Transfer Complete.
227 Entering Passive Mode (10,0,2,15,4,76)
150 Opening BINARY data connection for chmod.1 (779 bytes).
remote: chmod.1, local: chmod.1
226 Transfer Complete.
227 Entering Passive Mode (10,0,2,15,4,77)
150 Opening BINARY data connection for chown.1 (480 bytes).
remote: chown.1, local: chown.1
226 Transfer Complete.
227 Entering Passive Mode (10,0,2,15,4,78)
passive mode: Connection timed out
mget: 6 files transferred.
ftp>
Hello @Mellvik,
Needless to say, there's a lot to digest in this series of posts! I will give a detailed answer to your bulleted questions from the first post later today when I have time, but thought I'd respond with some more things to think about:
Given that we control not only the TCP implementation in this case, but also the ftp client and server, it may be worth considering adding the following features to ftp, in order to get a completely working version, which could then be used to later hammer out better performance underneath with ktcp:
Your list of 4-5 possible enhancements to ktcp in the first post are interesting; we will need to prioritize them as to what might make the best improvement without introducing other problems to an otherwise well-working implementation to reference TCP implementations. In particular, the "adjustment of ACK from every packet to highest SEQ" would probably be the best idea. For now, ktcp actually does both - it ACKs every packet, which always happens to be the highest SEQ received. For this to work as you imagine, we would need to take RFC advice of only ACKing every other packet, something I need to think about how to implement, as it likely involves a timeout if an expected next packet doesn't arrive.
Thank you!
Thank you @ghaerr,
Needless to say, there's a lot to digest in this series of posts! I will give a detailed answer to your bulleted questions from the first post later today when I have time, but thought I'd respond with some more things to think about:
there is no rush to this. The goal has been to get the issues out so we can think about how to solve them and take the time to do it right, whatever that ends up being. Ideally, we should be able to come up with a way to allow ELKS-to-ELKS to transfer without errors, or at least to always allow successful completion of transfers. ELKS ktcp suffers from a number of design and resource issues, some of which may not be able to be changed. We likely need to understand the reasons why retransmits are occurring, rather than reworking the retransmit algorithm. Agreed, and I will work up some examples. There may be several different reasons kicking in in different scenarios, but the scenario that seems most frequent is a retransmit after the ACK has arrived (on the wire). So the ACK is there, it just hasn't been processed yet. Then there is the 2 second close/FIN delay I mentioned (small files), which triggers a retransmit. TBC I suspect retransmits are occurring because ELKS send / receive window handling is allowing too many packets to be sent, which through subsequent delays and limited buffer resources, is triggering retransmits, further aggravating the problem. While the ELKS send window processing could probably be improved, in this case, we're coupled with an ELKS receiver, with known algorithmic issues in receive window calculation issues that aren't easily fixed. This doesn't occur when talking to a more sophisticated receiver (also with larger buffers). The case with the DIR command - directory listing is particularly interesting because we get a lot of small back-to-back packets (approx 60b ea). In this case the window size has no effect, and the receiving end suddenly gets 4-6 packets without getting the chance to ack any of them. This is where the 'ACK the last and highest' doesn't work. If ktcp could detect the activity level and purposely delay acking while keeping an eye on the window, I think we'd see improvement. Given that we control not only the TCP implementation in this case, but also the ftp client and server, it may be worth considering adding the following features to ftp, in order to get a completely working version, which could then be used to later hammer out better performance underneath with ktcp:
A configurable delay option (in seconds) between multiple file transfers, used to (at least temporarily) overcome the out of memory problem. That's certainly doable. WE could have a client side MDELAY command and give the client the ability to command the server to delay via the newly implemented SITE command, which currently only sets the idle timeout (via 'SITE IDLE nnn'). Thank would be good for debugging and for finding reasonable defaults while improving the stack. The DIR and NLS commands are the only culprits when it comes to many small packets, and can easily be changed (their output was buffered in early versions of the server), but they work (except for some retransmit) and they're great for pushing the limits and see how ktcp behaves. A configurable setsockopt option to pass to ktcp that would introduce an inter-packet send delay to further throttle back packet sending (this is currently hard-coded to 200ms). It may be good for experimentation, like the delays in ftp, since it's easy to turn on and off. To ease experimentation further (assuming that such option could be changed on open sockets), it could be connected to FTP commands like the others. This way it may be possible to find a balance between system performance (like bogomips) and delays to eventually automate the settings - if some of them prevail after (presumable) fixes. Your list of 4-5 possible enhancements to ktcp in the first post are interesting; we will need to prioritize them as to what might make the best improvement without introducing other problems to an otherwise well-working implementation to reference TCP implementations. In particular, the "adjustment of ACK from every packet to highest SEQ" would probably be the best idea. For now, ktcp actually does both - it ACKs every packet, which always happens to be the highest SEQ received. For this to work as you imagine, we would need to take RFC advice of only ACKing every other packet, something I need to think about how to implement, as it likely involves a timeout if an expected next packet doesn't arrive.
Yes, like I mentioned above, maybe a delay to get more packets in would be good.
BTW - right now I'm running 386/20 - 386/20, the picture may be different when I swap one of them with a 286/12. To be continued.
-M
@Mellvik, more comments on your comments. You're probably ahead of me on this with all your test data, so bear with me :)
the scenario that seems most frequent is a retransmit after the ACK has arrived (on the wire). So the ACK is there, it just hasn't been processed yet.
I am guessing the reason for this most frequent problem is floppy delay. That is, the floppy drive on the receiving end has to write data, and the delay in that write ends up taking longer than the unACKed packets retransmit timers. Perhaps, despite what I said earlier, we should allow a way to specify a different initial (minimum) retransmit setting, in order to test if this is the reason. What do you think, could floppy write delay be the reason for retransmit? (Remember, that ELKS will hang on disk I/O and ktcp can't run to process packets either).
Then there is the 2 second close/FIN delay I mentioned (small files), which triggers a retransmit.
This is the hack we had to (temporarily) add in order to fix the problem of FIN seen before ACCEPT. We discard the FIN and depend on the sender retransmit timeout to resend the FIN. In order to differentiate this case from normal packet retransmits above, we should only implement a minimum retransmit delay (for floppy I/O) on non-FIN packets. This delay may have to be 3-4 seconds? What do you think?
The case with the DIR command - directory listing is particularly interesting because we get a lot of small back-to-back packets (approx 60b ea). In this case the window size has no effect, and the receiving end suddenly gets 4-6 packets without getting the chance to ack any of them.
This is where the 'ACK the last and highest' doesn't work. If ktcp could detect the activity level and purposely delay acking while keeping an eye on the window, I think we'd see improvement.
I'm not quite sure if this is the ACK problem you describe. Ktcp always ACKs the highest sequence it's seen, but remember it ACKs every packet seen, and will DROP packets that are out of sequence. We will need the DEBUG_TUNE and ^P on to see the dropped packets and test carefully to see if this is actually an ACK problem. Also, the number of received packets shouldn't matter, unless the sending of ACKs causes the sender to retransmit earlier. Overall, I can't recommend we change too much at the TCP level unless we really know what we're doing, since things work fine on reference systems.
A configurable setsockopt option to pass to ktcp that would introduce an inter-packet send delay to further throttle back packet sending (this is currently hard-coded to 200ms).
This last statement was in error: ktcp does not introduce any inter-packet delays. Instead, the 200ms I referenced is the sleep time ktcp takes before checking to see whether the outstanding send window has been reduced, before trying to send another packet when the window is full. So We don't need an option to change this.
Summary:
Yes, and suggest possible retransmit timer option for non-FIN packets. This may not be a disk I/O issue as guessed in last post, could just be that ELKS-to-ELKS needs more time before retransmits because it flat sends too much data initially. (There is an algorithm that ELKS does not implement which is called "slow start" or something like that. That could be the reason for ELKS-to-ELKS problems).
Ktcp already ACKs the highest SEQ number received. As mentioned previously, I think the only solution here would be to ACK every other packet, which an RFC suggests. I haven't looked into how hard that would be, but would suggest other options first, as this would change the basic TCP algorithm, which we already have working well with other TCPs.
Ktcp cannot retry an out of memory condition without lots of work, since it is in a somewhat synchronous state with the kernel, and the kernel would have to retry as well, etc. We can reserve a smaller/larger memory limit for retrans memory, but I'm not sure that will do much. I am guessing setting a larger retrans timer might help more. We need to make sure we know the real reason behind retransmits, discussed in more detail previously.
Again, I'm not sure duplicate ACKs is the real problem here. I still am guessing the problem is that ELKS is sending too much data too quickly, and reference TCP handles that (likely with larger buffers) and ELKS receiver cannot. Perhaps "send slow start" might be a better option. I am not sure how to implement slow start, however. We could also try to lower the send packet size (programmatically) and see whether that helps, as that is essentially how "slow start" starts. There is some literature on the business of starting up sending after a ZWP, can you verify that our problems all start after a zero window, or do they sometimes start earlier?
No - I don't think allocating another send buffer will help, in fact likely will hurt, if the problem is that too much send data outstanding (unACKed) is the reason for the retransmits. (Other reason may be disk I/O speed).
Summary:
TCP Slow Start, Congestion Avoidance,Fast Retransmit, and Fast Recovery Algorithms
Section 1 on Slow Start may be worth trying, if we can prove it is related to one of the issues you're seeing. The whole RFC is pretty interesting.
Thank you @ghaerr - a very useful discussion indeed.
[I just noticed there is another mail after this one, so there may be some overlap or clarifications I've not seen yet.]
@Mellvik https://github.com/Mellvik, more comments on your comments. You're probably ahead of me on this with all your test data, so bear with me :)
the scenario that seems most frequent is a retransmit after the ACK has arrived (on the wire). So the ACK is there, it just hasn't been processed yet.
I am guessing the reason for this most frequent problem is floppy delay. That is, the floppy drive on the receiving end has to write data, and the delay in that write ends up taking longer than the unACKed packets retransmit timers. Perhaps, despite what I said earlier, we should allow a way to specify a different initial (minimum) retransmit setting, in order to test if this is the reason. What do you think, could floppy write delay be the reason for retransmit? (Remember, that ELKS will hang on disk I/O and ktcp can't run to process packets either).
I should have mentioned that I've not been using floppies at all for this testing. That said, introducing floppy writes at the receiving end is currently disastrous if the file is bigger than small (the sending ktcp will hang). So maybe we have two separate but related problems here. With floppy writes, the reason is obvious and we need to figure out a way to handle it - keeping in mind that the delays come and go during a transfer: We're fine for (say) 10kbytes, then full stop for 4-5 seconds, then another 10k or sow etc. Unless we have extensive buffering, in which case the problem just goes away. Correction: It comes back every SYNC= seconds… The other is why does this happen when there is no floppy involved.? I need to look closer at this one and eliminate the chance that it is actually the XMS SYNCing. I'll work up scenarios with traces for both of these. Then there is the 2 second close/FIN delay I mentioned (small files), which triggers a retransmit.
This is the hack we had to (temporarily) add in order to fix the problem of FIN seen before ACCEPT. We discard the FIN and depend on the sender retransmit timeout to resend the FIN. In order to differentiate this case from normal packet retransmits above, we should only implement a minimum retransmit delay (for floppy I/O) on non-FIN packets. This delay may have to be 3-4 seconds? What do you think?
No, this one is not important - rather just annoying because it delays small file transfers in a very visible way. Being the hack (and temporary) it is, let's leave it for now. The case with the DIR command - directory listing is particularly interesting because we get a lot of small back-to-back packets (approx 60b ea). In this case the window size has no effect, and the receiving end suddenly gets 4-6 packets without getting the chance to ack any of them.
This is where the 'ACK the last and highest' doesn't work. If ktcp could detect the activity level and purposely delay acking while keeping an eye on the window, I think we'd see improvement.
I'm not quite sure if this is the ACK problem you describe. Ktcp always ACKs the highest sequence it's seen, but remember it ACKs every packet seen, and will DROP packets that are out of sequence. We will need the DEBUG_TUNE and ^P on to see the dropped packets and test carefully to see if this is actually an ACK problem. Also, the number of received packets shouldn't matter, unless the sending of ACKs causes the sender to retransmit earlier. Overall, I can't recommend we change too much at the TCP level unless we really know what we're doing, since things work fine on reference systems.
I think we're seeing very different pictures here. There are no dropped packets, nothing out of sequence, just a receiving end falling further and further behind the sender. And I'm always referring to what's on the wires, as we cannot see what's in ktcp w/o disturbing the picture. The packet trace indicates that the sender is 5 packets ahead of the receiver. So - if ktcp always acks the last received packets, the rest of them are in the NIC buffer or in the kernel. Which means delayed ACKing is worth considering (and it's within TCP specs - we did this in SATNET between the ARPAnet and Europe aeons ago). A configurable setsockopt option to pass to ktcp that would introduce an inter-packet send delay to further throttle back packet sending (this is currently hard-coded to 200ms).
This last statement was in error: ktcp does not introduce any inter-packet delays. Instead, the 200ms I referenced is the sleep time ktcp takes before checking to see whether the outstanding send window has been reduced, before trying to send another packet when the window is full. So We don't need an option to change this.
Summary:
Do we need a special retransmit timer for floppy delays? How do reference TCP systems handle this when sending to ELKS doing floppy I/O, and what might be different when ELKS is the sender? I guess this is a classic example of flow control - the challenge not so much the mechanism as the balance, small packets vs. large(r). Right now I'm tempted to say we put it on hold till I have worked out some examples with larger packets, leaving the DIR example behind for now. What I'm saying is that efficient file transfer be the priority. 2 second FIN delay is known issue; we cannot work around this until FIN/ACCEPT processing is rewritten, which is outside the issue of ELKS-to-ELKS challenges. We'll leave this one for now. Unsure about "ACK the last and highest" - need more discussion and pure test case (I haven't spent time on traces yet, sorry). I the the discussion above clarifies this. Add MDELAY option to ftp in attempt to fix multiple file transfers from running ktcp out of resources. I've already put this into the ftp client. We could do it all (both ways) in the client, but that would be making assumptions about how the close/FIN procedure works and the experiment would be less reliable. So the server change is coming too.
—M
@ghaerr,
Can retransmit delay be adjusted in ways that reduce retransmits (there are never lost packets)? Yes, and suggest possible retransmit timer option for non-FIN packets. This may not be a disk I/O issue as guessed in last post, could just be that ELKS-to-ELKS needs more time before retransmits because it flat sends too much data initially. (There is an algorithm that ELKS does not implement which is called "slow start" or something like that. That could be the reason for ELKS-to-ELKS problems).
I'm (or was) familiar with slow start, IIRC it's a 'window-regulating' mechanism for links with unpredictable reliability (e.g. SATNET that I alluded to before). IMHO it's worth thinking carefully about this one, and I'm initially sceptical because it was made for much slower links (<64kbps) and our scenario is very different. Ours is binary in a sense - . on or off, not slower or faster, at least when we're talking about disk I/O blocking network activity. IOW we have 'predictable unpredictability'. We don't know when the problem occurs, but do know its characteristics when it happens. How much effort would it take to change ACKing from 'every packet' to 'highest SEQ # received'? This one in particular would change the picture dramatically, and has IIRC been on the table several times before. Ktcp already ACKs the highest SEQ number received. As mentioned previously, I think the only solution here would be to ACK every other packet, which an RFC suggests. I haven't looked into how hard that would be, but would suggest other options first, as this would change the basic TCP algorithm, which we already have working well with other TCPs.
Like I said in the previous message, I think a delayed ack when there is much traffic, is fine, and useful for any situation as it would reduce load on both ends, regardless of capacity. In particular it would help speed up ELKS, and it's actually what the TCP spec recommends. Can ktcp handle low memory situations smarter, and (say) introduce a short delay and try again a couple of times when creating new sockets fail with sbrk errors? Ktcp cannot retry an out of memory condition without lots of work, since it is in a somewhat synchronous state with the kernel, and the kernel would have to retry as well, etc. We can reserve a smaller/larger memory limit for retrans memory, but I'm not sure that will do much. I am guessing setting a larger retrans timer might help more. We need to make sure we know the real reason behind retransmits, discussed in more detail previously.
Agreed, and let's keep in mind that the growing retransmit buffer is the big problem (it's killing the connection and ktcp), not the retransmits. Then again, say, doubling the retransmit buffer just as an experiment may be worthwhile. You can send me the codeines and I'll make up a test case. Can the # of duplicate ACKs (the window knock ack introduced a while back, or maybe it was telnet needing a character to continue ) be reduced by looking at the actual window size or something else? In a balanced situation (data-ack-data-ack etc.) every ACK packet is duplicated on the wire. In a loaded situation, every 3rd ack is duplicated. Interestingly, when doing the same transfer to a faster client, there are NO duplicate ACKs. Again, I'm not sure duplicate ACKs is the real problem here.
No, duplicate acks are mostly annoyances plus extra delays - however small. I still am guessing the problem is that ELKS is sending too much data too quickly, and reference TCP handles that (likely with larger buffers) and ELKS receiver cannot. Perhaps "send slow start" might be a better option. I am not sure how to implement slow start, however. We could also try to lower the send packet size (programmatically) and see whether that helps, as that is essentially how "slow start" starts. There is some literature on the business of starting up sending after a ZWP, can you verify that our problems all start after a zero window, or do they sometimes start earlier?
None of my tests so far have had zero window situations. That's why I mentioned above that I'll set aside the DIR case and start testing wit full size packets only. Also I don't think (but don't discount) slow start in our case because the scenario is so different from what slow start was made to handle. It may also be an idea (it has been floated before) to increase the send buffer on the NIC from one to 2 packets (by reallocating one from the receive buffer pool), thereby reducing the send-wait looping in the driver and release resources to the system. No - I don't think allocating another send buffer will help, in fact likely will hurt, if the problem is that too much send data outstanding (unACKed) is the reason for the retransmits. (Other reason may be disk I/O speed).
Maybe. The thing is - improving send efficiency goes both ways. In addition to potentially speed up sending of data, it also speeds up sending ACK packets. Also, offloading more work to the NIC instead of the system gains processing capacity. A test for another day. Summary:
Need a way to change retransmit timer for testing (non-FIN packets only). Yes! I think this is pri#1 ACK every packet/highest SEQ: need more data to conclude this is actual problem vs retransmit or ZWP recovery I think this is more than obvious from the packet traces we already have. I'd suggest adding delayed ack for testing. Need information on "slow start" and/or zero window probe send recovery Agreed, fully inderstanding it so we can choose in or out. Decrease duplicate ACKs: need more data to conclude if this is actual problem, and duplicate ACKs were required for working with reference TCPs, so not great idea to change it. Yes, worthwhile only if a simple fix can do it.
(another) summary:
So here's my action list forward:
I'm working on this as we speak, and have the following (retransmit) example:
More about this in a separate post.
-M
Followup on the enable mentioned in the previous post. Scenario: Send big (!) 70k file from ELKS A to ELKS B via ftp.
Observations :
Maybe we can start off reducing by one and change the delay algorithm to 0.25-0.50-5-10-20. Which seems like a waste too, remembering they hardly ever get lost. So none of the resent packets make any difference whatsoever. The recipient will act on the original packet and discard the rest. It's an observation, not a suggestion we drop retransmits, just keep realities in mind. So maybe 0.25-1-5-20 make even more sense. (possible algorithm: (prev_value * 10)/retry# - which starting at 0.25 would give 0.25/1.25/6.25/~30.
Here's the trace, starting with the last few packets before the floppy kicks in.
11:30:54.288760 IP (tos 0x0, ttl 64, id 224, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0xe875 (correct), seq 60001:60513, ack 1, win 4380, length 512
11:30:54.304151 IP (tos 0x0, ttl 64, id 225, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x073f (correct), seq 60513:61025, ack 1, win 4380, length 512
11:30:54.305804 IP (tos 0x0, ttl 64, id 274, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xe415 (correct), ack 60513, win 3868, length 0
11:30:54.319023 IP (tos 0x0, ttl 64, id 226, offset 0, flags [none], proto TCP (6), length 516)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x3b97 (correct), seq 61025:61501, ack 1, win 4380, length 476
11:30:54.323111 IP (tos 0x0, ttl 64, id 275, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xe415 (correct), ack 61025, win 3356, length 0
11:30:54.339207 IP (tos 0x0, ttl 64, id 276, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xe415 (correct), ack 61501, win 2880, length 0
11:30:54.346336 IP (tos 0x0, ttl 64, id 277, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xde41 (correct), ack 61501, win 4372, length 0 <---------- floppy starts about here
11:30:54.349859 IP (tos 0x0, ttl 64, id 227, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x6072 (correct), seq 61501:62013, ack 1, win 4380, length 512
11:30:54.364855 IP (tos 0x0, ttl 64, id 228, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x11a7 (correct), seq 62013:62525, ack 1, win 4380, length 512
11:30:54.603942 IP (tos 0x0, ttl 64, id 229, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x11a7 (correct), seq 62013:62525, ack 1, win 4380, length 512
11:30:54.633802 IP (tos 0x0, ttl 64, id 230, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x6072 (correct), seq 61501:62013, ack 1, win 4380, length 512
11:30:55.104050 IP (tos 0x0, ttl 64, id 231, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x11a7 (correct), seq 62013:62525, ack 1, win 4380, length 512
11:30:55.134188 IP (tos 0x0, ttl 64, id 232, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x6072 (correct), seq 61501:62013, ack 1, win 4380, length 512
11:30:56.091695 IP (tos 0x0, ttl 64, id 233, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x11a7 (correct), seq 62013:62525, ack 1, win 4380, length 512
11:30:56.112917 IP (tos 0x0, ttl 64, id 234, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x6072 (correct), seq 61501:62013, ack 1, win 4380, length 512
11:30:58.091726 IP (tos 0x0, ttl 64, id 235, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x11a7 (correct), seq 62013:62525, ack 1, win 4380, length 512
11:30:58.113144 IP (tos 0x0, ttl 64, id 236, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x6072 (correct), seq 61501:62013, ack 1, win 4380, length 512
11:31:02.091797 IP (tos 0x0, ttl 64, id 237, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x11a7 (correct), seq 62013:62525, ack 1, win 4380, length 512
11:31:02.113044 IP (tos 0x0, ttl 64, id 238, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x6072 (correct), seq 61501:62013, ack 1, win 4380, length 512
11:31:06.091879 IP (tos 0x0, ttl 64, id 239, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x11a7 (correct), seq 62013:62525, ack 1, win 4380, length 512
11:31:06.104287 IP (tos 0x0, ttl 64, id 240, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [R], cksum 0xda45 (correct), seq 589207601, win 4380, length 0 <----- sender side resets
11:31:06.125810 IP (tos 0x0, ttl 64, id 241, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x6072 (correct), seq 61501:62013, ack 1, win 4380, length 512
11:31:06.138255 IP (tos 0x0, ttl 64, id 242, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [R], cksum 0xda45 (correct), seq 589207601, win 4380, length 0
11:31:07.639359 IP (tos 0x0, ttl 64, id 278, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xde41 (correct), ack 62013, win 3860, length 0
11:31:07.672706 IP (tos 0x0, ttl 64, id 243, offset 0, flags [none], proto TCP (6), length 516)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x6b7d (correct), seq 62525:63001, ack 1, win 4380, length 476
11:31:07.690426 IP (tos 0x0, ttl 64, id 279, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xde41 (correct), ack 62525, win 3348, length 0
11:31:07.706020 IP (tos 0x0, ttl 64, id 280, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xda39 (correct), ack 62525, win 4380, length 0
11:31:07.761525 IP (tos 0x0, ttl 64, id 281, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xda39 (correct), ack 62525, win 4380, length 0
11:31:07.804726 IP (tos 0x0, ttl 64, id 282, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xda39 (correct), ack 62525, win 4380, length 0
11:31:07.848256 IP (tos 0x0, ttl 64, id 283, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xda39 (correct), ack 62525, win 4380, length 0
11:31:07.891446 IP (tos 0x0, ttl 64, id 284, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xda39 (correct), ack 62525, win 4380, length 0
11:31:07.934768 IP (tos 0x0, ttl 64, id 285, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xda39 (correct), ack 62525, win 4380, length 0
11:31:07.978063 IP (tos 0x0, ttl 64, id 286, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xda39 (correct), ack 62525, win 4380, length 0
11:31:08.021272 IP (tos 0x0, ttl 64, id 287, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xda39 (correct), ack 62525, win 4380, length 0
11:31:08.064417 IP (tos 0x0, ttl 64, id 288, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xda39 (correct), ack 62525, win 4380, length 0
11:31:08.107896 IP (tos 0x0, ttl 64, id 289, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xda39 (correct), ack 62525, win 4380, length 0
11:31:08.151069 IP (tos 0x0, ttl 64, id 290, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xda39 (correct), ack 62525, win 4380, length 0
11:31:08.194418 IP (tos 0x0, ttl 64, id 291, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [.], cksum 0xda39 (correct), ack 62525, win 4380, length 0
11:31:08.540498 IP (tos 0x0, ttl 64, id 292, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [R], cksum 0xef60 (correct), seq 588236287, win 1, length 0
11:31:08.562330 IP (tos 0x0, ttl 64, id 293, offset 0, flags [none], proto TCP (6), length 64)
10.0.2.16.ftp > 10.0.2.15.1025: Flags [P.], cksum 0x03ff (correct), seq 219:243, ack 54, win 4380, length 24: FTP, length: 24
226 Transfer Complete.
11:31:08.618777 IP (tos 0x0, ttl 64, id 294, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [R], cksum 0xeb60 (correct), seq 588236287, win 1, length 0
11:31:10.035943 IP (tos 0x0, ttl 64, id 244, offset 0, flags [none], proto TCP (6), length 552)
10.0.2.15.1026 > 10.0.2.16.1024: Flags [P.], cksum 0x7857 (correct), seq 63001:63513, ack 1, win 4380, length 512
11:31:10.091977 IP (tos 0x0, ttl 64, id 295, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.16.1024 > 10.0.2.15.1026: Flags [R], cksum 0xe984 (correct), seq 588236287, win 1, length 0
11:31:10.094565 IP (tos 0x0, ttl 64, id 245, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.15.1025 > 10.0.2.16.ftp: Flags [.], cksum 0xd2f4 (correct), ack 243, win 4356, length 0
11:31:10.120063 IP (tos 0x0, ttl 64, id 296, offset 0, flags [none], proto TCP (6), length 64)
10.0.2.16.ftp > 10.0.2.15.1025: Flags [P.], cksum 0x03ff (correct), seq 219:243, ack 54, win 4380, length 24: FTP, length: 24
226 Transfer Complete.
11:31:10.134343 IP (tos 0x0, ttl 64, id 246, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.15.1025 > 10.0.2.16.ftp: Flags [.], cksum 0xd2f4 (correct), ack 243, win 4356, length 0
11:31:10.453435 IP (tos 0x0, ttl 64, id 247, offset 0, flags [none], proto TCP (6), length 40)
10.0.2.15.1025 > 10.0.2.16.ftp: Flags [.], cksum 0xd2dc (correct), ack 243, win 4380, length 0
11:31:29.348794 IP (tos 0x0, ttl 64, id 42625, offset 0, flags [none], proto UDP (17), length 367)
-M
Additional note about the example in the previous post and some further testing:
increasing TCP_RETRANS_MAXWAIT to (10<<4) (tcp.h
)obviously remedies the situation so that the transfer succeeds.
from the packet trace it's evident that the receiver acks every occurrence of a packet - including retransmits that have already been acked. This doesn't break anything but is as much a waste of resources as the many retransmits. The following change fixes the latter, and I have a hard time figuring out a scenario that makes this send_ack useful:
--- a/elkscmd/ktcp/tcp.c
+++ b/elkscmd/ktcp/tcp.c
@@ -508,8 +508,8 @@ void tcp_process(struct iphdr_s *iph)
cb->rcv_nxt - cb->irs, ntohl(tcph->seqnum) - cb->irs, datalen);
netstats.tcpdropcnt++;
if (cb->rcv_nxt != ntohl(tcph->seqnum) + 1)
tcp_send_ack(cb);
//if (cb->rcv_nxt != ntohl(tcph->seqnum) + 1)
//tcp_send_ack(cb);
return;
}
Also, a quick (test-)fix for the timeout works for now - reduces the # of retransmits for a typical floppy write (13 seconds) to 3:
--- a/elkscmd/ktcp/tcp_output.c +++ b/elkscmd/ktcp/tcp_output.c @@ -284,7 +284,7 @@ void tcp_reoutput(struct tcp_retrans_list_s *n)
if (datalen < n->cb->rcv_wnd) / don't record retry if not in recv window/ n->retrans_num++;
n->rto <<= 1; / double retrans timeout/
n->rto <<= 2; / quadruple retrans timeout/ if (n->rto > TCP_RETRANS_MAXWAIT) / limit retransmit timeouts to 4 seconds/ n->rto = TCP_RETRANS_MAXWAIT; n->next_retrans = Now + n->rto;
plus the change in `tcp.h`.
What the effects of these changes are in the bigger picture, remains to be seen.
Almost off topic, @ghaerr - could you add 'restart' to the net
script? I keep net stop; net start
ing continuously these days ...
Hello @Mellvik,
Since there are so many potential problems discussed here, I'm glad you've picked one to jump into and start coding: large file works great until floppy delay. This will also allow you to learn ktcp (as I did). I will just comment on this single problem for now.
I like the idea of working on one bigger problem at a time; also by fixing or minimizing that, we may find it not necessary to analyze ktcp in its "frenzied" state, at least for the time being.
I think you should try experimenting with the initial retransmit timeout as well, setting to a much higher value, say 10-20 seconds or more. That would be changed here in add_for_retrans(), where it is initially set to twice first packet's return trip time:
n->rto = cb->rtt << 1; /* set retrans timeout to twice RTT*/
if (linkprotocol == LINK_ETHER) {
if (n->rto < TCP_RETRANS_MINWAIT_ETH)
n->rto = TCP_RETRANS_MINWAIT_ETH; /* 1/4 sec min retrans timeout on ethernet*/
} else {
if (n->rto < TCP_RETRANS_MINWAIT_SLIP)
n->rto = TCP_RETRANS_MINWAIT_SLIP; /* 1/2 sec min retrans timeout on slip/cslip*/
}
The later lines set a minimum timeout, in the case the packet RTT was very quick.
I suggest for the moment we entirely ignore RTT and try an RTO of say, 10, 10, 10, 10 - that is, set to a high number and don't adjust it. Given your comments that no packets are lost, it would be interesting to see how this affects things. Another approach might be setting to a high value (20), then halving it until it gets to the minimum (20, 10, 5). If an initial or ongoing floppy delay of 20 seconds is always possible, perhaps double-halving from 20 (20, 5, 1.25). That would allow for a floppy delay but not keep the user waiting if something else went wrong.
The initial RTT value needs to be high enough that no retransmit occurs, (everything waits for the floppy, and no packets are lost).
You may want to just comment out the TCP_RETRANSMIT_MAXWAIT entirely, (or input a literal number in the code, not header file) so the algorithm is very clear.
None of my tests so far have had zero window situations.
Ok - thank you for the clarification. I had misread your traces seeing the "len 0" as a probe - I was incorrect. Thus we can probably ignore my previous suggestions.
The following change fixes the latter, and I have a hard time figuring out a scenario that makes this send_ack useful:
I don't recommend changing two things at once in ktcp. While you may be right about this "extra ACK" not being needed, it would be best to concentrate changes only in RTO, say, with no other changes, for the time being. That said, I thought this particular ACK was required in the case that when an out-of-sequence packet arrived, the sender was notified, rather than waiting for a timeout. (More discussion needed, but lets save that also for later).
Could you add 'restart' to the net script?
Sure.
Update on ELKS-ELKS ftp:
The ftp
client and server have been updated to support delays when making many connections in short intervals. Also, the new client will recognize that the other end is ELKS, and activate the default delay local as well as send the appropriate command to the server side to enable similar delays.
Unsurprisingly, this works well (and the changes are modest) and - just as unsurprisingly, the required delay is approximately 1 second. Both the server and the client adjust the delay by subtracting the transfer time of the previous transfer, thus optimizing transfers somewhat (i.e. if transferring only large (<50k) files, there will be no delay). The average per file delay when transferring all 50 files in the man/man1
directory is about 700ms.
There is probably room for tuning and optimizations. During testing, when max outstanding packets (# packets in the retransmit queue) is limited to 7 per cb, the delay can safely be reduced to 800ms. The delay value can be set individually for the client and the server - from the client, allowing for easy experimentation. It may be worthwhile (I'm experimenting with this now) reducing the length of the retransmit queue even further (1 is an interesting option). When communicating with faster systems, it's never used. When communicating with slow boxes (ELKS), it creates problems and delivers minimal benefit as fas as I can see. BTW - I've also reversed the retransmit queue (add-to-end instead of add-to-front), so that the oldest un-acked packet is retransmitted first, which probably has some, but very limited performance effect. Also, the retransmit timeouts have been adjusted as discussed above, keeping the first @ 0.25 seconds to take care of the ignored FIN case.
IOW - as far as ftp
is concerned, we have stable and predictable ELKS-to-ELKS communication: No ktcp
out-of-memory situations or running out of sockets. It's not fast but it's the best we can do for now. And the delay never affects single file transfers or multiple transfers when the files are 'big'.
Comments and ideas solicited.
-M
The ftp client and server have been updated to support delays when making many connections in short intervals. as far as ftp is concerned, we have stable and predictable ELKS-to-ELKS communication
Sounds very nice @Mellvik! Was this accomplished only by mods in ftp/ftpd, or did ktcp have to be modified also?
when max outstanding packets (# packets in the retransmit queue) is limited to 7 per cb, the delay can safely be reduced to 800ms
How was this accomplished, did you modify add_for_retrans to stop saving packets after 7? On a WAN connection, should a packet actually be dropped that requires ktcp to have a retrans packet for it, ktcp won't be able to recover, as the application can't backtrack to the data already sent to ktcp.
It may be worthwhile (I'm experimenting with this now) reducing the length of the retransmit queue even further (1 is an interesting option)
If ktcp sends a packet, and then has to retransmit it, there is no way for it to retransmit that packet should the packet be lost, of course. Is this experimentation just for LANs assuming that won't ever happen?
For LAN use only, it could possibly make sense to not store retrans packets in the name of conserving memory, but this would go against the specification and ktcp could not recover if a packet was actually lost. OTOH, if ktcp can't recover anyways, I can see the motivation. In that case, what we really have is a still-buggy retransmit recovery scenario, which may have never been fixed properly.
When communicating with slow boxes (ELKS), it creates problems and delivers minimal benefit as fas as I can see.
It seems you're saying that when retrans kicks in (when a packet was never actually lost), transfers become out of sync and don't recover? Can't we just set the initial retrans time quite a bit higher in that case? Or do we end up running out of memory allocating the retrans packets? Perhaps the sending algorithm should hold sending not just on the outstanding (byte) window, but also perhaps on the number of outstanding un-acked packets, no matter how small the packets are. Holding back sending (done normally on zero window) would be an easy way to throttle back send activity which is waiting for disk I/O anyways.
I've also reversed the retransmit queue (add-to-end instead of add-to-front), so that the oldest un-acked packet is retransmitted first, which probably has some, but very limited performance effect.
If this makes a difference, then it seems that when any retransmit occurs, ktcp finds itself sending more than one retrans packet. I can see that if the receiving end is ELKS, the order sent could make a difference, since the receiver has to drop any out-of-order packet, unlike Linux TCP implementations which will buffer them.
Also, the retransmit timeouts have been adjusted as discussed above, keeping the first @ 0.25 seconds to take care of the ignored FIN case.
It sounds like a combination of ktcp changes as well as ftp/ftpd changes were required in order for ELKS-to-ELKS transfers to become stable.
I may have more ideas once I understand more exactly what changes you've made in order to get everything working.
Hi @ghaerr,
The ftp client and server have been updated to support delays when making many connections in short intervals. as far as ftp is concerned, we have stable and predictable ELKS-to-ELKS communication
Sounds very nice @Mellvik https://github.com/Mellvik! Was this accomplished only by mods in ftp/ftpd, or did ktcp have to be modified also?
This part is only ftp/ftpd - with commands to change the delay on server and client (separately) at any time.
220 Welcome - ELKS minimal FTP server speaking. Name (10.0.2.16:ftp): 331 User OK. Password: 230 Password OK. 215 UNIX Type: L8 (Linux/ELKS). Remote system is Unix/Linux, using binary mode Server is ELKS; setting passive mode delay to 800ms 200 mget delay set to 800ms. ftp> delay Delay between MPUT files is 800ms ftp> delay 1000 Delay between MPUT files is 1000ms ftp> site mdly 1000 <— set server side delay 200 mget delay set to 1000ms. ftp> (the informational messages have been changed since)
when max outstanding packets (# packets in the retransmit queue) is limited to 7 per cb, the delay can safely be reduced to 800ms
How was this accomplished, did you modify add_for_retrans to stop saving packets after 7? On a WAN connection, should a packet actually be dropped that requires ktcp to have a retrans packet for it, ktcp won't be able to recover, as the application can't backtrack to the data already sent to ktcp.
I added a packet counter to the CB and set a MAX value in the header file that can easily be changed. The actual change is in tcpdev.c which will return a -ERESTARTSYS if the retransmit queue is at max level. It may be worthwhile (I'm experimenting with this now) reducing the length of the retransmit queue even further (1 is an interesting option)
There is no discarding packets anywhere so that's not an issue. What inspired the single packet testing is that this is what traffic from other systems (mac, Linux) looks like on the wire. There are hardly ever two or more unacked packets from their side, ping/pong (packet/ack, packet/ack). Just a test (coming up) to see if there are positive (or negative) side effects from this. When communicating with slow boxes (ELKS), it creates problems and delivers minimal benefit as fas as I can see.
It seems you're saying that when retrans kicks in (when a packet was never actually lost), transfers become out of sync and don't recover?
No, and FWIW this is a minor thing, consider it tuning, not fixing. It just doesn't make sense to retransmit packet 3 when packet 1 is the one that hasn't been acked.
Perhaps the sending algorithm should hold sending not just on the outstanding (byte) window, but also perhaps on the number of outstanding un-acked packets, no matter how small the packets are. Holding back sending (done normally on zero window) would be an easy way to throttle back send activity which is waiting for disk I/O anyways.
YEs, this is what the packet counter above is doing. I've also reversed the retransmit queue (add-to-end instead of add-to-front), so that the oldest un-acked packet is retransmitted first, which probably has some, but very limited performance effect.
If this makes a difference, then it seems that when any retransmit occurs, ktcp finds itself sending more than one retrans packet.
It does, and yes, that's why this issue popped up. Elks sends 2 or 3 (usually 2) packets back to back, then waits, then (if timeout) retransmits both (because their retransmit timeout is about the same given the timer resolution), and they appear in the wrong order. No practical problem as long as they come in pairs, but the odd cases when they don't, we get a delay. I can see that if the receiving end is ELKS, the order sent could make a difference, since the receiver has to drop any out-of-order packet, unlike Linux TCP implementations which will buffer them.
Also, the retransmit timeouts have been adjusted as discussed above, keeping the first @ 0.25 seconds to take care of the ignored FIN case.
It sounds like a combination of ktcp changes as well as ftp/ftpd changes were required in order for ELKS-to-ELKS transfers to become stable.
I may have more ideas once I understand more exactly what changes you've made in order to get everything working.
PRs are coming, the ftp/ftpd coming first. I have more work to do in ktcp testing with floppy only transfers and retransmits. I added delayed acks, but it didn't make any difference (so far).
BTW - after changing the 'direction' of the retransmit queue I'm having fun with the following (minor) choice to be made. When adding new packets to the end (instead of the front) we need to actually get to the end every time, and the question is how to get there: 1) [works]run through the list every time (the current solution): {while (n->next) n=n->next;} 2) [easy]keep an end pointer (like retrans_list, only pointing to the end instead of the front) 3) [possibly the right thing]make the list circular so retrans_list->prev always points to the last element (saves a pointer, but entails a lot of code changes because there are tests for NULL pointers all over tcp_output.c)
Unless there are compelling arguments to choose 3, 2 is the obvious choice.
—M
I'm deep into ktcp these days - with the ambition to eliminate the need for delays on ELKS-to-ELKS. Here's the thing: There is no need for delays between other hosts and ELKS - not mget, not mput, not passive, not active, not when ELKS is the client, not when ELKS is the server. Since they are faster, the need for delays in at least some of these situations should objectively be greater. So - back to the packet traces to track down the differences. (it may turn out to be as simple as packet size, we'll see).
There's lots to keep track of, so I've copied your comment to the correct issue; needed to keep things organized.
Your comment above is very interesting - it will be good to know what you find is actually contributing to the observed problems only seen with ELKS-to-ELKS communication. As you figure out what's happening, hopefully we can improve the (last?) are of ELKS networking that isn't implemented properly.
Thank you!
Thanks for cleaning up the thread, @ghaerr! I agree.
Yes, going for the telnet hang/double ack is a great idea. I'll disable the double ack and see if I can get the problem to repeat. Then take it from there.
There are many interesting things to learn i ktcp and the packet traces. Limiting the # of packets outstanding on a connection improved the situation some. Also it turns out that Linux always responds to a FIN with FIN-ACK, instead of ACK, then FIN. This speeds up socket close by bypassing CLOSE_WAIT, going direcly to LAST_ACK (which is within the standard). Adding this to ELKS delivered another noticable improvement. However, the slowness of ELKS means that at the point of receiving the FIN there is usually still data waiting to be read by the kernel, so we have to go to CLOSE_WAIT anyway before the CB may be released. So the mileage will vary.
The big 'thing' in this situation is the size of the retransmit queue, as indicated by the fact that when ktcp runs out of memory, it's (almost) always the sending side, regardless of which side is 'accept' and 'connect'. Which explains the effect of the limit on outstanding packets. The fact that it works fine when Linux is at the other end may be explained by the speed of the recipient. There is no buildup of retransmits at all. This remains to be verified, and there is still room for improvement. Including experimentation with packet sizes and the size of the retransmit queue.
--M;
- apr. 2022 kl. 19:17 skrev Gregory Haerr @.***>:
I'm deep into ktcp these days - with the ambition to eliminate the need for delays on ELKS-to-ELKS. Here's the thing: There is no need for delays between other hosts and ELKS - not mget, not mput, not passive, not active, not when ELKS is the client, not when ELKS is the server. Since they are faster, the need for delays in at least some of these situations should objectively be greater. So - back to the packet traces to track down the differences. (it may turn out to be as simple as packet size, we'll see).
There's lots to keep track of, so I've copied your comment to the correct issue; needed to keep things organized.
Your comment above is very interesting - it will be good to know what you find is actually contributing to the observed problems only seen with ELKS-to-ELKS communication. As you figure out what's happening, hopefully we can improve the (last?) are of ELKS networking that isn't implemented properly.
Thank you!
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
Thanks you @ghaerr,
yes, I think this is the right approach since I'm sitting on the only test lab at this time. Understanding more of the gory internals of ktcp doesn't harm at all.
I've been pondering the initial timeout for retransmits, and while I like the idea, I have two considerations that I'd like your opinion on.
1) We have the case - mentioned just recently - of the purposely dropped FIN, which is depending on a rapid retransmit in order to work as planned.
2) My claim that we never lose packet, isn't 100% true. There will be cases where the old clunker stops for long enough for the NIC to get an overrun, which means all packets get discarded. However rare, when it happens we'd like to have a chance to get back on track within a reasonable timeframe.
So I was thinking it's good that the first retransmission timeout is less than 1 second - for the FIN case, less than .5 secs. Opinions?
-M
- mar. 2022 kl. 16:27 skrev Gregory Haerr @.***>:
Hello @Mellvik https://github.com/Mellvik,
Since there are so many potential problems discussed here, I'm glad you've picked one to jump into and start coding: large file works great until floppy delay. This will also allow you to learn ktcp (as I did). I will just comment on this single problem for now.
I like the idea of working on one bigger problem at a time; also by fixing or minimizing that, we may find it not necessary to analyze ktcp in its "frenzied" state, at least for the time being.
I think you should try experimenting with the initial retransmit timeout as well, setting to a much higher value, say 10-20 seconds or more. That would be changed here in add_for_retrans(), where it is initially set to twice first packet's return trip time:
n->rto = cb->rtt << 1; /* set retrans timeout to twice RTT*/ if (linkprotocol == LINK_ETHER) { if (n->rto < TCP_RETRANS_MINWAIT_ETH) n->rto = TCP_RETRANS_MINWAIT_ETH; /* 1/4 sec min retrans timeout on ethernet*/ } else { if (n->rto < TCP_RETRANS_MINWAIT_SLIP) n->rto = TCP_RETRANS_MINWAIT_SLIP; /* 1/2 sec min retrans timeout on slip/cslip*/ }
The later lines set a minimum timeout, in the case the packet RTT was very quick.
I suggest for the moment we entirely ignore RTT and try an RTO of say, 10, 10, 10, 10 - that is, set to a high number and don't adjust it. Given your comments that no packets are lost, it would be interesting to see how this affects things. Another approach might be setting to a high value (20), then halving it until it gets to the minimum (20, 10, 5). If an initial or ongoing floppy delay of 20 seconds is always possible, perhaps double-halving from 20 (20, 5, 1.25). That would allow for a floppy delay but not keep the user waiting if something else went wrong.
The initial RTT value needs to be high enough that no retransmit occurs, (everything waits for the floppy, and no packets are lost).
You may want to just comment out the TCP_RETRANSMIT_MAXWAIT entirely, (or input a literal number in the code, not header file) so the algorithm is very clear.
— Reply to this email directly, view it on GitHub https://github.com/jbruchon/elks/issues/1228#issuecomment-1069252229, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WGOEOZHQIFSVYLCFUJH3VAH4WRANCNFSM5QYSTF3A. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.
I'm opening this issue to track experiences and problems related to networking between ELKS systems.
As it turns out, a quite different challenge than what we've had so far, the key difference being that
ktcp
no longer necessarily is the underdog in terms of speed and processing capacity.The testing is obviously physical only. Sometimes 386/20-386/20, some times 386/20 - 286/12.
The difference from previous testing, in which ELKS consistently is the weak part apparent immediately when making the first connections between these systems: Some or many retransmits. All exchanges have retransmits but there is no packet loss, so timing (or selecting intervals) is an issue. In many cases the receiving end of a file transfer gets so busy receiving packets that there is little time to ack them, and the receiver is falling behind.
Case in point: When DEBUG_TCPPKT is active on the sending side, the exchange is 'in sync': The debug output delays sending sufficiently to enable the receiver to keep up. For singular transfers, this 'asynchronicity' is little more than annoying (debug messages) and a delay. As activity increases (such as back-to-back file transfers via ftp, frequent close/reopens), ktcp runs out of memory for un-acked packets, which in turn delays the closedown of sockets and new opens fail with sbrk errors. Part of the challenge is that we don't know exactly what goes on inside
ktcp
because activating the various debug settings change the equation so much the answer becomes irrelevant or at least less useful. What's on the wire (tcpdump
) is reliable but the processing sequence inside ELKS may be quite different.These are mostly resource problems, not bugs - as evidenced by the fact that things generally work fine when one end is resourceful.
Still it's interesting to pose the question about what can be improved at what cost. I may not be the best qualified to do that, but here's some loud thinking anyway:
ktcp
handle low memory situations smarter, and (say) introduce a short delay and try again a couple of times when creating new sockets fail with sbrk errors?telnet
needing a character to continue ) be reduced by looking at the actual window size or something else? In a balanced situation (data-ack-data-ack etc.) every ACK packet is duplicated on the wire. In a loaded situation, every 3rd ack is duplicated. Interestingly, when doing the same transfer to a faster client, there are NO duplicate ACKs.For reference I'm including two packet traces below, from identical transfers (an ftp dir command, actually an
ls -l
which emits one packet per line)m the first with DEBUG_TCPPKT on (thus balanced), the 2nd (separate message) without TCPPKT. Notably the 'distance' between data packet and ACK in the latter is frequently 20 packets (send + receive).sbrk
examples will be posted in a separate message.tcpdump - ftp dir command - DEBUG_TCPPKT enabled: