all downloads regularly go to 0kb/sec

Rik44444 commented 4 years ago

hi all, i have an issue with Tribler: all my downloads regularly regress to 0 kb/sec. They pick up again after a while. I've tried to create logs for this to share with the developers, but even with the log option checked in the options, my log folder stays empty. What did i miss? I'm on Windows 10, using latest stable version 7.4.4 from the Tribler.org website. Rik

ichorid commented 4 years ago

Another detail: when just adding a download, for a brief moment it will show "1(0)" or "30(0)" peers before showing 0 and getting stuck permanently.

egbertbouman commented 4 years ago

I just tried to see if disabling the download dialog also fixes the problem I'm having, but it doesn't. If I don't use force_dht_announce I consistently get a delay of a few minutes (using 0 hops).

ichorid commented 4 years ago

One of the downloads got unstuck after a few minutes. Though the download sees only a single peer and is thus very slow. The torrent is a healthy one, it got thousands of peers (I use it regularly for testing).

@arvidn , what could make Libtorrent ignore other peers and only use one?

Here are some session stats (for this run I disabled Libtorrent logging, unfortunately):

Libtorrent stats

```json "dht.dht_allocated_observers" : 1 "dht.dht_announce_peer_in" : 36, "dht.dht_announce_peer_out" : 8, "dht.dht_bytes_in" : 338130, "dht.dht_bytes_out" : 652219, "dht.dht_find_node_in" : 196, "dht.dht_find_node_out" : 0, "dht.dht_get_in" : 0, "dht.dht_get_out" : 0, "dht.dht_get_peers_in" : 1630, "dht.dht_get_peers_out" : 718, "dht.dht_immutable_data" : 0, "dht.dht_invalid_announce" : 0, "dht.dht_invalid_find_node" : 0, "dht.dht_invalid_get" : 0, "dht.dht_invalid_get_peers" : 0, "dht.dht_invalid_put" : 0, "dht.dht_invalid_sample_infohashes" : 0, "dht.dht_messages_in" : 2371, "dht.dht_messages_in_dropped" : 2, "dht.dht_messages_out" : 2644, "dht.dht_messages_out_dropped" : 0, "dht.dht_mutable_data" : 0, "dht.dht_node_cache" : 119, "dht.dht_nodes" : 191, "dht.dht_peers" : 36, "dht.dht_ping_in" : 14, "dht.dht_ping_out" : 21, "dht.dht_put_in" : 0, "dht.dht_put_out" : 0, "dht.dht_sample_infohashes_in" : 0, "dht.dht_sample_infohashes_out" : 0, "dht.dht_torrents" : 18, "disk.arc_mfu_ghost_size" : 0, "disk.arc_mfu_size" : 0, "disk.arc_mru_ghost_size" : 32, "disk.arc_mru_size" : 62, "disk.arc_volatile_size" : 0, "disk.arc_write_size" : 1, "disk.blocked_disk_jobs" : 0, "disk.disk_blocks_in_use" : 1993, "disk.disk_hash_time" : 198127, "disk.disk_job_time" : 267741, "disk.disk_read_time" : 0, "disk.disk_write_time" : 69614, "disk.num_blocks_cache_hits" : 0, "disk.num_blocks_hashed" : 3817, "disk.num_blocks_read" : 0, "disk.num_blocks_written" : 3808, "disk.num_fenced_check_fastresume" : 0, "disk.num_fenced_clear_piece" : 0, "disk.num_fenced_delete_files" : 0, "disk.num_fenced_file_priority" : 0, "disk.num_fenced_flush_hashed" : 0, "disk.num_fenced_flush_piece" : 0, "disk.num_fenced_flush_storage" : 0, "disk.num_fenced_hash" : 0, "disk.num_fenced_load_torrent" : 0, "disk.num_fenced_move_storage" : 0, "disk.num_fenced_read" : 0, "disk.num_fenced_release_files" : 0, "disk.num_fenced_rename_file" : 0, "disk.num_fenced_save_resume_data" : 0, "disk.num_fenced_stop_torrent" : 0, "disk.num_fenced_tick_storage" : 0, "disk.num_fenced_trim_cache" : 0, "disk.num_fenced_write" : 0, "disk.num_jobs" : 9, "disk.num_read_back" : 0, "disk.num_read_jobs" : 0, "disk.num_read_ops" : 0, "disk.num_running_disk_jobs" : 0, "disk.num_running_threads" : 3, "disk.num_write_jobs" : 9, "disk.num_write_ops" : 238, "disk.num_writing_threads" : 0, "disk.pinned_blocks" : 0, "disk.queued_disk_jobs" : 0, "disk.queued_write_bytes" : 147456, "disk.read_cache_blocks" : 1984, "disk.request_latency" : 0, "disk.write_cache_blocks" : 9, "net.has_incoming_connections" : 1, "net.limiter_down_bytes" : 0, "net.limiter_down_queue" : 0, "net.limiter_up_bytes" : 0, "net.limiter_up_queue" : 0, "net.on_accept_counter" : 210, "net.on_disk_counter" : 4073, "net.on_disk_queue_counter" : 0, "net.on_lsd_counter" : 32, "net.on_lsd_peer_counter" : 48478, "net.on_read_counter" : 42775, "net.on_tick_counter" : 2043, "net.on_udp_counter" : 4143, "net.on_write_counter" : 4318, "net.recv_bytes" : 62712526, "net.recv_failed_bytes" : 0, "net.recv_ip_overhead_bytes" : 2072188, "net.recv_payload_bytes" : 62552396, "net.recv_redundant_bytes" : 0, "net.recv_tracker_bytes" : 0, "net.sent_bytes" : 125703, "net.sent_ip_overhead_bytes" : 2091252, "net.sent_payload_bytes" : 0, "net.sent_tracker_bytes" : 0, "peer.aborted_peers" : 0, "peer.addrinuse_peers" : 0, "peer.banned_for_hash_failure" : 0, "peer.boost_connection_attempts" : 90, "peer.broken_pipe_peers" : 0, "peer.buffer_peers" : 0, "peer.cancelled_piece_requests" : 0, "peer.choked_piece_requests" : 0, "peer.connaborted_peers" : 0, "peer.connection_attempt_loops" : 24696, "peer.connection_attempts" : 159, "peer.connect_timeouts" : 102, "peer.connrefused_peers" : 3, "peer.connreset_peers" : 17, "peer.disconnected_peers" : 704, "peer.eof_peers" : 110, "peer.error_encrypted_peers" : 0, "peer.error_incoming_peers" : 322, "peer.error_outgoing_peers" : 27, "peer.error_peers" : 0, "peer.error_rc4_peers" : 0, "peer.error_tcp_peers" : 198, "peer.error_utp_peers" : 151, "peer.incoming_connections" : 458, "peer.invalid_arg_peers" : 0, "peer.invalid_piece_requests" : 0, "peer.max_piece_requests" : 0, "peer.missed_connection_attempts" : 0, "peer.no_access_peers" : 0, "peer.no_memory_peers" : 0, "peer.no_peer_connection_attempts" : 6969, "peer.notconnected_peers" : 0, "peer.num_banned_peers" : 9, "peer.num_http_proxy_peers" : 0, "peer.num_i2p_peers" : 0, "peer.num_peers_connected" : 3, "peer.num_peers_down_disk" : 0, "peer.num_peers_down_interested" : 1, "peer.num_peers_down_requests" : 1, "peer.num_peers_down_unchoked" : 1, "peer.num_peers_end_game" : 0, "peer.num_peers_half_open" : 0, "peer.num_peers_up_disk" : 0, "peer.num_peers_up_interested" : 0, "peer.num_peers_up_requests" : 0, "peer.num_peers_up_unchoked" : 0, "peer.num_peers_up_unchoked_all" : 0, "peer.num_peers_up_unchoked_optimistic" : 0, "peer.num_socks5_peers" : 0, "peer.num_ssl_http_proxy_peers" : 0, "peer.num_ssl_peers" : 0, "peer.num_ssl_socks5_peers" : 0, "peer.num_ssl_utp_peers" : 0, "peer.num_tcp_peers" : 2, "peer.num_utp_peers" : 1, "peer.perm_peers" : 0, "peer.piece_rejects" : 0, "peer.piece_requests" : 0, "peer.timeout_peers" : 0, "peer.too_many_peers" : 0, "peer.transport_timeout_peers" : 89, "peer.uninteresting_peers" : 19, "peer.unreachable_peers" : 0, "picker.end_game_piece_picks" : 0, "picker.hash_fail_piece_picks" : 0, "picker.incoming_piece_picks" : 35, "picker.incoming_redundant_piece_picks" : 0, "picker.interesting_piece_picks" : 1, "picker.piece_picker_busy_loops" : 0, "picker.piece_picker_partial_loops" : 13, "picker.piece_picker_rand_loops" : 128, "picker.piece_picker_rand_start_loops" : 3087, "picker.piece_picker_rare_loops" : 19, "picker.piece_picker_reverse_rare_loops" : 0, "picker.piece_picker_sequential_loops" : 0, "picker.piece_picker_suggest_loops" : 0, "picker.reject_piece_picks" : 0, "picker.snubbed_piece_picks" : 0, "picker.unchoke_piece_picks" : 0, "ses.non_filter_torrents" : 13, "ses.num_checking_torrents" : 0, "ses.num_downloading_torrents" : 3, "ses.num_error_torrents" : 0, "ses.num_have_pieces" : 205, "ses.num_incoming_allowed_fast" : 0, "ses.num_incoming_bitfield" : 3, "ses.num_incoming_cancel" : 0, "ses.num_incoming_choke" : 0, "ses.num_incoming_dht_port" : 1, "ses.num_incoming_extended" : 425, "ses.num_incoming_ext_handshake" : 116, "ses.num_incoming_have" : 0, "ses.num_incoming_have_all" : 0, "ses.num_incoming_have_none" : 0, "ses.num_incoming_interested" : 0, "ses.num_incoming_metadata" : 106, "ses.num_incoming_not_interested" : 2, "ses.num_incoming_pex" : 100, "ses.num_incoming_piece" : 3817, "ses.num_incoming_reject" : 0, "ses.num_incoming_request" : 0, "ses.num_incoming_suggest" : 0, "ses.num_incoming_unchoke" : 1, "ses.num_outgoing_allowed_fast" : 0, "ses.num_outgoing_bitfield" : 2, "ses.num_outgoing_cancel" : 10, "ses.num_outgoing_choke" : 3, "ses.num_outgoing_dht_port" : 15, "ses.num_outgoing_extended" : 312, "ses.num_outgoing_ext_handshake" : 129, "ses.num_outgoing_have" : 119, "ses.num_outgoing_have_all" : 19, "ses.num_outgoing_have_none" : 1, "ses.num_outgoing_interested" : 1, "ses.num_outgoing_metadata" : 213, "ses.num_outgoing_not_interested" : 0, "ses.num_outgoing_pex" : 98, "ses.num_outgoing_piece" : 0, "ses.num_outgoing_reject" : 0, "ses.num_outgoing_request" : 3857, "ses.num_outgoing_suggest" : 0, "ses.num_outgoing_unchoke" : 21, "ses.num_outstanding_accept" : 1, "ses.num_piece_failed" : 0, "ses.num_piece_passed" : 205, "ses.num_queued_download_torrents" : 0, "ses.num_queued_seeding_torrents" : 0, "ses.num_seeding_torrents" : 8, "ses.num_stopped_torrents" : 0, "ses.num_total_pieces_added" : 3261, "ses.num_unchoke_slots" : 8, "ses.num_upload_only_torrents" : 2, "ses.torrent_evicted_counter" : 0, "ses.waste_piece_cancelled" : 0, "ses.waste_piece_closing" : 0, "ses.waste_piece_end_game" : 0, "ses.waste_piece_seed" : 0, "ses.waste_piece_timed_out" : 0, "ses.waste_piece_unknown" : 0, "sock_bufs.socket_recv_size10" : 38528, "sock_bufs.socket_recv_size11" : 2508, "sock_bufs.socket_recv_size12" : 58, "sock_bufs.socket_recv_size13" : 16, "sock_bufs.socket_recv_size14" : 22, "sock_bufs.socket_recv_size15" : 3, "sock_bufs.socket_recv_size16" : 0, "sock_bufs.socket_recv_size17" : 0, "sock_bufs.socket_recv_size18" : 0, "sock_bufs.socket_recv_size19" : 0, "sock_bufs.socket_recv_size20" : 0, "sock_bufs.socket_recv_size3" : 115, "sock_bufs.socket_recv_size4" : 79, "sock_bufs.socket_recv_size5" : 53, "sock_bufs.socket_recv_size6" : 688, "sock_bufs.socket_recv_size7" : 171, "sock_bufs.socket_recv_size8" : 161, "sock_bufs.socket_recv_size9" : 341, "sock_bufs.socket_send_size10" : 0, "sock_bufs.socket_send_size11" : 0, "sock_bufs.socket_send_size12" : 0, "sock_bufs.socket_send_size13" : 0, "sock_bufs.socket_send_size14" : 0, "sock_bufs.socket_send_size15" : 0, "sock_bufs.socket_send_size16" : 0, "sock_bufs.socket_send_size17" : 0, "sock_bufs.socket_send_size18" : 0, "sock_bufs.socket_send_size19" : 0, "sock_bufs.socket_send_size20" : 0, "sock_bufs.socket_send_size3" : 217, "sock_bufs.socket_send_size4" : 3478, "sock_bufs.socket_send_size5" : 218, "sock_bufs.socket_send_size6" : 235, "sock_bufs.socket_send_size7" : 24, "sock_bufs.socket_send_size8" : 133, "sock_bufs.socket_send_size9" : 13, "utp.num_utp_close_wait" : 0, "utp.num_utp_connected" : 1, "utp.num_utp_deleted" : 0, "utp.num_utp_fin_sent" : 0, "utp.num_utp_idle" : 0, "utp.num_utp_syn_sent" : 0, "utp.utp_fast_retransmit" : 0, "utp.utp_invalid_pkts_in" : 0, "utp.utp_packet_loss" : 0, "utp.utp_packet_resend" : 14, "utp.utp_packets_in" : 1393, "utp.utp_packets_out" : 1494, "utp.utp_payload_pkts_in" : 541, "utp.utp_payload_pkts_out" : 373, "utp.utp_redundant_pkts_in" : 108, "utp.utp_samples_above_target" : 0, "utp.utp_samples_below_target" : 334, "utp.utp_timeout" : 291, ```

EDIT: while I was typing this, another peer was added to the download:

EDIT2: while watchin the thing download I noticed something suspicious: all the clients that were connectied were either Libtorrent based, or 0% completed, no speed, no client-identifier string. After the download stopped, the list of peers looked like this:

Also, while downloading, I started a virtual machine (bridged) with another Tribler instance and started the same download. It started normally, lots of peers, etc. The 1st Tribler instance was immediately able to see the 2nd one and started to leech from it full-speed.

EDIT3: After a few hours the peers list still looks strange:

Some more Libtorrent stats after an hour of seeding stuck:

Libtorrent stats

```json "dht.dht_allocated_observers" : 3, "dht.dht_announce_peer_in" : 313, "dht.dht_announce_peer_out" : 8, "dht.dht_bytes_in" : 1542646, "dht.dht_bytes_out" : 3191438, "dht.dht_find_node_in" : 1885, "dht.dht_find_node_out" : 0, "dht.dht_get_in" : 0, "dht.dht_get_out" : 0, "dht.dht_get_peers_in" : 7058, "dht.dht_get_peers_out" : 2563, "dht.dht_immutable_data" : 0, "dht.dht_invalid_announce" : 0, "dht.dht_invalid_find_node" : 0, "dht.dht_invalid_get" : 0, "dht.dht_invalid_get_peers" : 0, "dht.dht_invalid_put" : 0, "dht.dht_invalid_sample_infohashes" : 0, "dht.dht_messages_in" : 12387, "dht.dht_messages_in_dropped" : 1299, "dht.dht_messages_out" : 12449, "dht.dht_messages_out_dropped" : 0, "dht.dht_mutable_data" : 0, "dht.dht_node_cache" : 175, "dht.dht_nodes" : 339, "dht.dht_peers" : 76, "dht.dht_ping_in" : 135, "dht.dht_ping_out" : 291, "dht.dht_put_in" : 0, "dht.dht_put_out" : 0, "dht.dht_sample_infohashes_in" : 74, "dht.dht_sample_infohashes_out" : 0, "dht.dht_torrents" : 29, "disk.arc_mfu_ghost_size" : 0, "disk.arc_mfu_size" : 0, "disk.arc_mru_ghost_size" : 32, "disk.arc_mru_size" : 73, "disk.arc_volatile_size" : 0, "disk.arc_write_size" : 0, "disk.blocked_disk_jobs" : 0, "disk.disk_blocks_in_use" : 2002, "disk.disk_hash_time" : 5091226, "disk.disk_job_time" : 6919958, "disk.disk_read_time" : 44892, "disk.disk_write_time" : 1783840, "disk.num_blocks_cache_hits" : 5206, "disk.num_blocks_hashed" : 101600, "disk.num_blocks_read" : 5681, "disk.num_blocks_written" : 101600, "disk.num_fenced_check_fastresume" : 0, "disk.num_fenced_clear_piece" : 0, "disk.num_fenced_delete_files" : 0, "disk.num_fenced_file_priority" : 0, "disk.num_fenced_flush_hashed" : 0, "disk.num_fenced_flush_piece" : 0, "disk.num_fenced_flush_storage" : 0, "disk.num_fenced_hash" : 0, "disk.num_fenced_load_torrent" : 0, "disk.num_fenced_move_storage" : 0, "disk.num_fenced_read" : 0, "disk.num_fenced_release_files" : 0, "disk.num_fenced_rename_file" : 0, "disk.num_fenced_save_resume_data" : 0, "disk.num_fenced_stop_torrent" : 0, "disk.num_fenced_tick_storage" : 0, "disk.num_fenced_trim_cache" : 0, "disk.num_fenced_write" : 0, "disk.num_jobs" : 0, "disk.num_read_back" : 0, "disk.num_read_jobs" : 0, "disk.num_read_ops" : 188, "disk.num_running_disk_jobs" : 0, "disk.num_running_threads" : 0, "disk.num_write_jobs" : 0, "disk.num_write_ops" : 6327, "disk.num_writing_threads" : 0, "disk.pinned_blocks" : 0, "disk.queued_disk_jobs" : 0, "disk.queued_write_bytes" : 0, "disk.read_cache_blocks" : 2002, "disk.request_latency" : 0, "disk.write_cache_blocks" : 0, "net.has_incoming_connections" : 1, "net.limiter_down_bytes" : 0, "net.limiter_down_queue" : 0, "net.limiter_up_bytes" : 0, "net.limiter_up_queue" : 0, "net.on_accept_counter" : 864, "net.on_disk_counter" : 106829, "net.on_disk_queue_counter" : 0, "net.on_lsd_counter" : 255, "net.on_lsd_peer_counter" : 279858, "net.on_read_counter" : 1078629, "net.on_tick_counter" : 12463, "net.on_udp_counter" : 799217, "net.on_write_counter" : 110058, "net.recv_bytes" : 1668814134, "net.recv_failed_bytes" : 0, "net.recv_ip_overhead_bytes" : 54007476, "net.recv_payload_bytes" : 1667116196, "net.recv_redundant_bytes" : 2501796, "net.recv_tracker_bytes" : 0, "net.sent_bytes" : 90148238, "net.sent_ip_overhead_bytes" : 54073112, "net.sent_payload_bytes" : 87838382, "net.sent_tracker_bytes" : 0, "peer.aborted_peers" : 0, "peer.addrinuse_peers" : 0, "peer.banned_for_hash_failure" : 0, "peer.boost_connection_attempts" : 303, "peer.broken_pipe_peers" : 0, "peer.buffer_peers" : 0, "peer.cancelled_piece_requests" : 0, "peer.choked_piece_requests" : 0, "peer.connaborted_peers" : 0, "peer.connection_attempt_loops" : 225746, "peer.connection_attempts" : 1068, "peer.connect_timeouts" : 513, "peer.connrefused_peers" : 12, "peer.connreset_peers" : 103, "peer.disconnected_peers" : 3151, "peer.eof_peers" : 623, "peer.error_encrypted_peers" : 0, "peer.error_incoming_peers" : 1295, "peer.error_outgoing_peers" : 77, "peer.error_peers" : 0, "peer.error_rc4_peers" : 0, "peer.error_tcp_peers" : 699, "peer.error_utp_peers" : 673, "peer.incoming_connections" : 1796, "peer.invalid_arg_peers" : 0, "peer.invalid_piece_requests" : 0, "peer.max_piece_requests" : 0, "peer.missed_connection_attempts" : 0, "peer.no_access_peers" : 0, "peer.no_memory_peers" : 0, "peer.no_peer_connection_attempts" : 40704, "peer.notconnected_peers" : 0, "peer.num_banned_peers" : 62, "peer.num_http_proxy_peers" : 0, "peer.num_i2p_peers" : 0, "peer.num_peers_connected" : 17, "peer.num_peers_down_disk" : 0, "peer.num_peers_down_interested" : 0, "peer.num_peers_down_requests" : 24, "peer.num_peers_down_unchoked" : 0, "peer.num_peers_end_game" : 0, "peer.num_peers_half_open" : 0, "peer.num_peers_up_disk" : 0, "peer.num_peers_up_interested" : 0, "peer.num_peers_up_requests" : 2, "peer.num_peers_up_unchoked" : 0, "peer.num_peers_up_unchoked_all" : 0, "peer.num_peers_up_unchoked_optimistic" : 0, "peer.num_socks5_peers" : 0, "peer.num_ssl_http_proxy_peers" : 0, "peer.num_ssl_peers" : 0, "peer.num_ssl_socks5_peers" : 0, "peer.num_ssl_utp_peers" : 0, "peer.num_tcp_peers" : 17, "peer.num_utp_peers" : 0, "peer.perm_peers" : 0, "peer.piece_rejects" : 0, "peer.piece_requests" : 5800, "peer.timeout_peers" : 0, "peer.too_many_peers" : 0, "peer.transport_timeout_peers" : 346, "peer.uninteresting_peers" : 86, "peer.unreachable_peers" : 0, "picker.end_game_piece_picks" : 0, "picker.hash_fail_piece_picks" : 0, "picker.incoming_piece_picks" : 1128, "picker.incoming_redundant_piece_picks" : 66, "picker.interesting_piece_picks" : 14, "picker.piece_picker_busy_loops" : 297, "picker.piece_picker_partial_loops" : 151, "picker.piece_picker_rand_loops" : 128, "picker.piece_picker_rand_start_loops" : 3087, "picker.piece_picker_rare_loops" : 118811, "picker.piece_picker_reverse_rare_loops" : 0, "picker.piece_picker_sequential_loops" : 0, "picker.piece_picker_suggest_loops" : 80, "picker.reject_piece_picks" : 4, "picker.snubbed_piece_picks" : 0, "picker.unchoke_piece_picks" : 28, "ses.non_filter_torrents" : 13, "ses.num_checking_torrents" : 0, "ses.num_downloading_torrents" : 2, "ses.num_error_torrents" : 0, "ses.num_have_pieces" : 3261, "ses.num_incoming_allowed_fast" : 0, "ses.num_incoming_bitfield" : 12, "ses.num_incoming_cancel" : 0, "ses.num_incoming_choke" : 7, "ses.num_incoming_dht_port" : 25, "ses.num_incoming_extended" : 1654, "ses.num_incoming_ext_handshake" : 475, "ses.num_incoming_have" : 998, "ses.num_incoming_have_all" : 0, "ses.num_incoming_have_none" : 0, "ses.num_incoming_interested" : 2, "ses.num_incoming_metadata" : 430, "ses.num_incoming_not_interested" : 2, "ses.num_incoming_pex" : 319, "ses.num_incoming_piece" : 101744, "ses.num_incoming_reject" : 0, "ses.num_incoming_request" : 5800, "ses.num_incoming_suggest" : 0, "ses.num_incoming_unchoke" : 47, "ses.num_outgoing_allowed_fast" : 5, "ses.num_outgoing_bitfield" : 56, "ses.num_outgoing_cancel" : 30, "ses.num_outgoing_choke" : 26, "ses.num_outgoing_dht_port" : 42, "ses.num_outgoing_extended" : 1179, "ses.num_outgoing_ext_handshake" : 508, "ses.num_outgoing_have" : 13095, "ses.num_outgoing_have_all" : 38, "ses.num_outgoing_have_none" : 1, "ses.num_outgoing_interested" : 28, "ses.num_outgoing_metadata" : 860, "ses.num_outgoing_not_interested" : 26, "ses.num_outgoing_pex" : 289, "ses.num_outgoing_piece" : 5530, "ses.num_outgoing_reject" : 0, "ses.num_outgoing_request" : 102027, "ses.num_outgoing_suggest" : 0, "ses.num_outgoing_unchoke" : 67, "ses.num_outstanding_accept" : 1, "ses.num_piece_failed" : 0, "ses.num_piece_passed" : 3261, "ses.num_queued_download_torrents" : 0, "ses.num_queued_seeding_torrents" : 0, "ses.num_seeding_torrents" : 8, "ses.num_stopped_torrents" : 0, "ses.num_total_pieces_added" : 3261, "ses.num_unchoke_slots" : 8, "ses.num_upload_only_torrents" : 3, "ses.torrent_evicted_counter" : 0, "ses.waste_piece_cancelled" : 0, "ses.waste_piece_closing" : 142500, "ses.waste_piece_end_game" : 1032192, "ses.waste_piece_seed" : 0, "ses.waste_piece_timed_out" : 16384, "ses.waste_piece_unknown" : 1310720, "sock_bufs.socket_recv_size10" : 1017017, "sock_bufs.socket_recv_size11" : 38973, "sock_bufs.socket_recv_size12" : 1974, "sock_bufs.socket_recv_size13" : 452, "sock_bufs.socket_recv_size14" : 286, "sock_bufs.socket_recv_size15" : 120, "sock_bufs.socket_recv_size16" : 94, "sock_bufs.socket_recv_size17" : 100, "sock_bufs.socket_recv_size18" : 36, "sock_bufs.socket_recv_size19" : 17, "sock_bufs.socket_recv_size20" : 1, "sock_bufs.socket_recv_size3" : 1105, "sock_bufs.socket_recv_size4" : 505, "sock_bufs.socket_recv_size5" : 493, "sock_bufs.socket_recv_size6" : 3224, "sock_bufs.socket_recv_size7" : 1184, "sock_bufs.socket_recv_size8" : 1269, "sock_bufs.socket_recv_size9" : 12412, "sock_bufs.socket_send_size10" : 641, "sock_bufs.socket_send_size11" : 95, "sock_bufs.socket_send_size12" : 88, "sock_bufs.socket_send_size13" : 187, "sock_bufs.socket_send_size14" : 620, "sock_bufs.socket_send_size15" : 302, "sock_bufs.socket_send_size16" : 357, "sock_bufs.socket_send_size17" : 115, "sock_bufs.socket_send_size18" : 15, "sock_bufs.socket_send_size19" : 0, "sock_bufs.socket_send_size20" : 0, "sock_bufs.socket_send_size3" : 15091 "sock_bufs.socket_send_size4" : 85871, "sock_bufs.socket_send_size5" : 3335, "sock_bufs.socket_send_size6" : 1682, "sock_bufs.socket_send_size7" : 623, "sock_bufs.socket_send_size8" : 834, "sock_bufs.socket_send_size9" : 202, "utp.num_utp_close_wait" : 0, "utp.num_utp_connected" : 0, "utp.num_utp_deleted" : 0, "utp.num_utp_fin_sent" : 0, "utp.num_utp_idle" : 0, "utp.num_utp_syn_sent" : 0, "utp.utp_fast_retransmit" : 2327, "utp.utp_invalid_pkts_in" : 0, "utp.utp_packet_loss" : 548, "utp.utp_packet_resend" : 2601, "utp.utp_packets_in" : 817018, "utp.utp_packets_out" : 854578, "utp.utp_payload_pkts_in" : 809009, "utp.utp_payload_pkts_out" : 70920, "utp.utp_redundant_pkts_in" : 555, "utp.utp_samples_above_target" : 82, "utp.utp_samples_below_target" : 12907, "utp.utp_timeout" : 1055, ```

uTP stats look strange to me :thinking:

ichorid commented 4 years ago

I started a Wireshark session capturing the exchange of packets between the stuck Tribler instance and the external world. One of the uTP sessions captured:

@arvidn , is it normal for Libtorrent to regularly send to other peers Timestamp Difference of ff ff f3 db microseconds? If it could help, I can send you the whole capture.

Another conversation. Does it look normal?

ichorid commented 4 years ago

Changed anonymity to 1 hop on one of the downloads, it started to download:

The list of peers looks normal too.

So, the problem is either local to Libtorrent session, or have something to do with NAT traversal.

ichorid commented 4 years ago

Kept Tribler running overnight, all downloads finished, newly added torrents download fine. :man_shrugging:

Stats after the night:

Libtorrent stats

```json "dht.dht_allocated_observers" : 397, "dht.dht_announce_peer_in" : 4497, "dht.dht_announce_peer_out" : 2152, "dht.dht_bytes_in" : 21490288, "dht.dht_bytes_out" : 47431209, "dht.dht_find_node_in" : 37467, "dht.dht_find_node_out" : 0, "dht.dht_get_in" : 0, "dht.dht_get_out" : 0, "dht.dht_get_peers_in" : 87688, "dht.dht_get_peers_out" : 51610, "dht.dht_immutable_data" : 0, "dht.dht_invalid_announce" : 76, "dht.dht_invalid_find_node" : 0, "dht.dht_invalid_get" : 0, "dht.dht_invalid_get_peers" : 10, "dht.dht_invalid_put" : 0, "dht.dht_invalid_sample_infohashes" : 0, "dht.dht_messages_in" : 178853, "dht.dht_messages_in_dropped" : 12798, "dht.dht_messages_out" : 200013, "dht.dht_messages_out_dropped" : 5, "dht.dht_mutable_data" : 0, "dht.dht_node_cache" : 177, "dht.dht_nodes" : 383, "dht.dht_peers" : 140, "dht.dht_ping_in" : 8122, "dht.dht_ping_out" : 6127, "dht.dht_put_in" : 0, "dht.dht_put_out" : 0, "dht.dht_sample_infohashes_in" : 1504, "dht.dht_sample_infohashes_out" : 0, "dht.dht_torrents" : 81, "disk.arc_mfu_ghost_size" : 0, "disk.arc_mfu_size" : 0, "disk.arc_mru_ghost_size" : 0, "disk.arc_mru_size" : 0, "disk.arc_volatile_size" : 0, "disk.arc_write_size" : 0, "disk.blocked_disk_jobs" : 0, "disk.disk_blocks_in_use" : 0, "disk.disk_hash_time" : 20561734, "disk.disk_job_time" : 31731884, "disk.disk_read_time" : 1089913, "disk.disk_write_time" : 10080237, "disk.num_blocks_cache_hits" : 9027, "disk.num_blocks_hashed" : 419476, "disk.num_blocks_read" : 17408, "disk.num_blocks_written" : 424939, "disk.num_fenced_check_fastresume" : 0, "disk.num_fenced_clear_piece" : 0, "disk.num_fenced_delete_files" : 0, "disk.num_fenced_file_priority" : 0, "disk.num_fenced_flush_hashed" : 0, "disk.num_fenced_flush_piece" : 0, "disk.num_fenced_flush_storage" : 0, "disk.num_fenced_hash" : 0, "disk.num_fenced_load_torrent" : 0, "disk.num_fenced_move_storage" : 0, "disk.num_fenced_read" : 0, "disk.num_fenced_release_files" : 0, "disk.num_fenced_rename_file" : 0, "disk.num_fenced_save_resume_data" : 0, "disk.num_fenced_stop_torrent" : 0, "disk.num_fenced_tick_storage" : 0, "disk.num_fenced_trim_cache" : 0, "disk.num_fenced_write" : 0, "disk.num_jobs" : 0, "disk.num_read_back" : 1784, "disk.num_read_jobs" : 0, "disk.num_read_ops" : 2337, "disk.num_running_disk_jobs" : 0, "disk.num_running_threads" : 4, "disk.num_write_jobs" : 0, "disk.num_write_ops" : 26451, "disk.num_writing_threads" : 0, "disk.pinned_blocks" : 0, "disk.queued_disk_jobs" : 0, "disk.queued_write_bytes" : 0, "disk.read_cache_blocks" : 0, "disk.request_latency" : 0, "disk.write_cache_blocks" : 0, "net.has_incoming_connections" : 1, "net.limiter_down_bytes" : 0, "net.limiter_down_queue" : 0, "net.limiter_up_bytes" : 0, "net.limiter_up_queue" : 0, "net.on_accept_counter" : 13490, "net.on_disk_counter" : 415445, "net.on_disk_queue_counter" : 0, "net.on_lsd_counter" : 889, "net.on_lsd_peer_counter" : 286780, "net.on_read_counter" : 3829398, "net.on_tick_counter" : 99580, "net.on_udp_counter" : 3672116, "net.on_write_counter" : 757311, "net.recv_bytes" : 6986940034, "net.recv_failed_bytes" : 524288, "net.recv_ip_overhead_bytes" : 253428724, "net.recv_payload_bytes" : 6976265288, "net.recv_redundant_bytes" : 14366795, "net.recv_tracker_bytes" : 1706, "net.sent_bytes" : 176973443, "net.sent_ip_overhead_bytes" : 254552384, "net.sent_payload_bytes" : 156297511, "net.sent_tracker_bytes" : 1645, "peer.aborted_peers" : 0, "peer.addrinuse_peers" : 0, "peer.banned_for_hash_failure" : 0, "peer.boost_connection_attempts" : 1377, "peer.broken_pipe_peers" : 1, "peer.buffer_peers" : 0, "peer.cancelled_piece_requests" : 181, "peer.choked_piece_requests" : 6, "peer.connaborted_peers" : 0, "peer.connection_attempt_loops" : 19488928, "peer.connection_attempts" : 11658, "peer.connect_timeouts" : 7414, "peer.connrefused_peers" : 601, "peer.connreset_peers" : 435, "peer.disconnected_peers" : 46090, "peer.eof_peers" : 2934, "peer.error_encrypted_peers" : 0, "peer.error_incoming_peers" : 20870, "peer.error_outgoing_peers" : 3174, "peer.error_peers" : 29, "peer.error_rc4_peers" : 0, "peer.error_tcp_peers" : 10301, "peer.error_utp_peers" : 13743, "peer.incoming_connections" : 33073, "peer.invalid_arg_peers" : 0, "peer.invalid_piece_requests" : 0, "peer.max_piece_requests" : 0, "peer.missed_connection_attempts" : 0, "peer.no_access_peers" : 0, "peer.no_memory_peers" : 0, "peer.no_peer_connection_attempts" : 246114, "peer.notconnected_peers" : 0, "peer.num_banned_peers" : 63, "peer.num_http_proxy_peers" : 0, "peer.num_i2p_peers" : 0, "peer.num_peers_connected" : 6, "peer.num_peers_down_disk" : 0, "peer.num_peers_down_interested" : 0, "peer.num_peers_down_requests" : 771, "peer.num_peers_down_unchoked" : 0, "peer.num_peers_end_game" : 0, "peer.num_peers_half_open" : 15, "peer.num_peers_up_disk" : 0, "peer.num_peers_up_interested" : 0, "peer.num_peers_up_requests" : 22, "peer.num_peers_up_unchoked" : 0, "peer.num_peers_up_unchoked_all" : 0, "peer.num_peers_up_unchoked_optimistic" : 0, "peer.num_socks5_peers" : 0, "peer.num_ssl_http_proxy_peers" : 0, "peer.num_ssl_peers" : 0, "peer.num_ssl_socks5_peers" : 0, "peer.num_ssl_utp_peers" : 0, "peer.num_tcp_peers" : 16, "peer.num_utp_peers" : 12, "peer.perm_peers" : 0, "peer.piece_rejects" : 187, "peer.piece_requests" : 10453, "peer.timeout_peers" : 17, "peer.too_many_peers" : 1, "peer.transport_timeout_peers" : 7408, "peer.uninteresting_peers" : 10677, "peer.unreachable_peers" : 87, "picker.end_game_piece_picks" : 0, "picker.hash_fail_piece_picks" : 0, "picker.incoming_piece_picks" : 19526, "picker.incoming_redundant_piece_picks" : 216, "picker.interesting_piece_picks" : 5416, "picker.piece_picker_busy_loops" : 7796, "picker.piece_picker_partial_loops" : 19115, "picker.piece_picker_rand_loops" : 4320, "picker.piece_picker_rand_start_loops" : 911150, "picker.piece_picker_rare_loops" : 4952547, "picker.piece_picker_reverse_rare_loops" : 1120, "picker.piece_picker_sequential_loops" : 0, "picker.piece_picker_suggest_loops" : 10503, "picker.reject_piece_picks" : 65, "picker.snubbed_piece_picks" : 0, "picker.unchoke_piece_picks" : 759 "ses.non_filter_torrents" : 30, "ses.num_checking_torrents" : 0, "ses.num_downloading_torrents" : 0, "ses.num_error_torrents" : 0, "ses.num_have_pieces" : 20888, "ses.num_incoming_allowed_fast" : 0, "ses.num_incoming_bitfield" : 1077, "ses.num_incoming_cancel" : 516, "ses.num_incoming_choke" : 63, "ses.num_incoming_dht_port" : 1020, "ses.num_incoming_extended" : 14661, "ses.num_incoming_ext_handshake" : 11756, "ses.num_incoming_have" : 34668, "ses.num_incoming_have_all" : 0, "ses.num_incoming_have_none" : 0, "ses.num_incoming_interested" : 97, "ses.num_incoming_metadata" : 757, "ses.num_incoming_not_interested" : 24, "ses.num_incoming_pex" : 1041, "ses.num_incoming_piece" : 425464, "ses.num_incoming_reject" : 0, "ses.num_incoming_request" : 10453, "ses.num_incoming_suggest" : 0, "ses.num_incoming_unchoke" : 1129, "ses.num_outgoing_allowed_fast" : 52, "ses.num_outgoing_bitfield" : 2232, "ses.num_outgoing_cancel" : 1960, "ses.num_outgoing_choke" : 2773, "ses.num_outgoing_dht_port" : 9001, "ses.num_outgoing_extended" : 3184, "ses.num_outgoing_ext_handshake" : 12282, "ses.num_outgoing_have" : 355185, "ses.num_outgoing_have_all" : 10362, "ses.num_outgoing_have_none" : 316, "ses.num_outgoing_interested" : 969, "ses.num_outgoing_metadata" : 1376, "ses.num_outgoing_not_interested" : 78, "ses.num_outgoing_pex" : 1533, "ses.num_outgoing_piece" : 10203, "ses.num_outgoing_reject" : 84, "ses.num_outgoing_request" : 479234, "ses.num_outgoing_suggest" : 0, "ses.num_outgoing_unchoke" : 11030, "ses.num_outstanding_accept" : 1, "ses.num_piece_failed" : 1, "ses.num_piece_passed" : 20888, "ses.num_queued_download_torrents" : 0, "ses.num_queued_seeding_torrents" : 0, "ses.num_seeding_torrents" : 10, "ses.num_stopped_torrents" : 0, "ses.num_total_pieces_added" : 76258, "ses.num_unchoke_slots" : 8, "ses.num_upload_only_torrents" : 20, "ses.torrent_evicted_counter" : 0, "ses.waste_piece_cancelled" : 0, "ses.waste_piece_closing" : 5699659, "ses.waste_piece_end_game" : 2949120, "ses.waste_piece_seed" : 0, "ses.waste_piece_timed_out" : 1458176, "ses.waste_piece_unknown" : 4259840, "sock_bufs.socket_recv_size10" : 2773534, "sock_bufs.socket_recv_size11" : 460429, "sock_bufs.socket_recv_size12" : 144090, "sock_bufs.socket_recv_size13" : 18952, "sock_bufs.socket_recv_size14" : 5022, "sock_bufs.socket_recv_size15" : 1166, "sock_bufs.socket_recv_size16" : 649, "sock_bufs.socket_recv_size17" : 468, "sock_bufs.socket_recv_size18" : 349, "sock_bufs.socket_recv_size19" : 237, "sock_bufs.socket_recv_size20" : 55, "sock_bufs.socket_recv_size3" : 6250, "sock_bufs.socket_recv_size4" : 4928, "sock_bufs.socket_recv_size5" : 4866, "sock_bufs.socket_recv_size6" : 54941, "sock_bufs.socket_recv_size7" : 19743, "sock_bufs.socket_recv_size8" : 16254, "sock_bufs.socket_recv_size9" : 350545, "sock_bufs.socket_send_size10" : 8354, "sock_bufs.socket_send_size11" : 1921, "sock_bufs.socket_send_size12" : 1701, "sock_bufs.socket_send_size13" : 1070, "sock_bufs.socket_send_size14" : 1956, "sock_bufs.socket_send_size15" : 508, "sock_bufs.socket_send_size16" : 388, "sock_bufs.socket_send_size17" : 121, "sock_bufs.socket_send_size18" : 17, "sock_bufs.socket_send_size19" : 0, "sock_bufs.socket_send_size20" : 0, "sock_bufs.socket_send_size3" : 347212, "sock_bufs.socket_send_size4" : 284679, "sock_bufs.socket_send_size5" : 59510, "sock_bufs.socket_send_size6" : 10424, "sock_bufs.socket_send_size7" : 8527, "sock_bufs.socket_send_size8" : 25444, "sock_bufs.socket_send_size9" : 5479, "utp.num_utp_close_wait" : 3, "utp.num_utp_connected" : 6, "utp.num_utp_deleted" : 1, "utp.num_utp_fin_sent" : 3, "utp.num_utp_idle" : 0, "utp.num_utp_syn_sent" : 5, "utp.utp_fast_retransmit" : 3362, "utp.utp_invalid_pkts_in" : 41, "utp.utp_packet_loss" : 1286, "utp.utp_packet_resend" : 6158, "utp.utp_packets_in" : 6172832, "utp.utp_packets_out" : 3968989, "utp.utp_payload_pkts_in" : 5985767, "utp.utp_payload_pkts_out" : 214153, "utp.utp_redundant_pkts_in" : 54781, "utp.utp_samples_above_target" : 907, "utp.utp_samples_below_target" : 139177, "utp.utp_timeout" : 24793, ```

arvidn commented 4 years ago

is it normal for Libtorrent to regularly send to other peers Timestamp Difference of ff ff f3 db microseconds?

Yes, the timestamp difference is the difference between the sender's and the receiver's clocks. They aren't synchronized so the difference is mostly a non-sensical value. However, it's very useful to monitor changes over time. The assumption in uTP's congestion control is that the clocks won't drift too much, so the change in this diff represents changes in buffering along the network path.

Regarding the libtorrent stats, some of those numbers are "counters", meaning they are cumulative and monotonic. Some of them are "gauges" meaning they are instantaneous snapshots of the current state (say, the number of uTP connections in a certain state, right now). The counters are mostly useful when collecting multiple samples and comparing them over time.

ichorid commented 4 years ago

@arvidn , would you be able to figure out what's happening if I send you a Wireshark capture of Tribler stuck trying to download the torrents? Also, what could have caused the behaviour on the screenshots above, namely Tribler connecting only to the 0% nameless clients?

arvidn commented 4 years ago

I could give it a shot. It’s all direct peer connections, not encrypted nor over a proxy, right?

arvidn commented 4 years ago

I wrote this tool a while back which might be useful: https://github.com/arvidn/bittorrent-trace

ichorid commented 4 years ago

@arvidn , I appliet your bittorrent-trace utility to the exchange file of the stuck Tribler instance. Aside from other peers requesting metainfo for test torrents I used to run previously, there a dir with the relevant (ubuntu) infohash.

Aside from a single exchange, all captures looked like this:

>> 1591212017.149 HANDSHAKE
>> 1591212017.149 RESERVED 0000000000180005
>> 1591212017.149 INFO-HASH ee55335f2acde309fa645fab11c04750d7e45fa1
>> 1591212017.149 PEER-ID 0920dc71b767d3bfe956e469b5a144b712079088 [. .q.g...V.i..D.....]
<< 1591212017.624 FIN
>> 1591212017.624 FIN

The single non-trivial exchange was:

Exchange

``` >> 1591212149.120 HANDSHAKE >> 1591212149.120 RESERVED 0000000000180005 >> 1591212149.120 INFO-HASH ee55335f2acde309fa645fab11c04750d7e45fa1 >> 1591212149.120 PEER-ID 0920dc71b767d3bfe956e469b5a144b712079088 [. .q.g...V.i..D.....] << 1591212149.248 HANDSHAKE << 1591212149.248 RESERVED 0000000000100005 << 1591212149.248 INFO-HASH ee55335f2acde309fa645fab11c04750d7e45fa1 << 1591212149.248 PEER-ID 2d4254376135532d5db24d457b03e331caaa59eb [-BT7a5S-].ME{..1..Y.] >> 1591212149.248 EXTENSION-HANDSHAKE { 'complete_ago': 480, 'm': { 'lt_donthave': 7, 'share_mode': 8, 'upload_only': 3, 'ut_holepunch': 4, 'ut_metadata': 2, 'ut_pex': 1 }, 'metadata_size': 63600, 'reqq': 500, 'upload_only': 1, 'v': 'Tribler/7.5.0-GIT/0920dc71b767d3bfe956e469b5a144b712079088', 'yourip': '%\x06\xff\x06' } >> 1591212149.248 HAVE-ALL >> 1591212149.248 UNCHOKE << 1591212149.370 EXTENSION-HANDSHAKE { 'e': 0, 'ipv4': '%\x06\xff\x06', 'complete_ago': 2, 'm': { 'upload_only': 3, 'lt_donthave': 7, 'ut_holepunch': 4, 'ut_metadata': 2, 'ut_pex': 1, 'ut_comment': 6 }, 'metadata_size': 63600, 'p': 15857, 'reqq': 255, 'v': 'BitTorrent 7.10.5', 'yp': 39177, 'yourip': 'ST\x04\xd9' } << 1591212149.370 BITFIELD 11111111111110111111111111111111111111111111111111101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111111111111111111111011111111111111111111111111111111111111111111111111101111011111111111111111111111111111111111110111110111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111011111111111111111111111110111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111111111111111111111111111111111111111111111111111111111111111111111111111111111110111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111111111111111111111111111111111110111111111111111101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101111111111111111111111111111111111111111111111111111111111111110111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111011111111111111111111111111111100111111111111110 << 1591212149.370 HAVE 1945 << 1591212149.370 HAVE 2433 << 1591212149.370 HAVE 3128 << 1591212149.370 HAVE 1308 << 1591212149.370 HAVE 1256 << 1591212149.370 HAVE 51 << 1591212149.370 HAVE 1313 << 1591212149.370 HAVE 1357 << 1591212149.370 HAVE 1234 << 1591212149.370 HAVE 2608 << 1591212149.370 HAVE 3031 << 1591212149.370 HAVE 13 << 1591212149.370 HAVE 2967 << 1591212149.370 HAVE 2416 << 1591212149.370 HAVE 1724 << 1591212149.370 HAVE 3159 << 1591212149.370 HAVE 3160 << 1591212149.370 HAVE 1750 << 1591212149.370 HAVE 2029 << 1591212149.370 HAVE 2148 << 1591212149.370 HAVE 676 << 1591212149.370 HAVE 1351 << 1591212149.370 HAVE 2380 << 1591212149.370 HAVE 142 >> 1591212149.370 FIN << 1591212149.372 FIN ```

arvidn commented 4 years ago

In the non-trivial exchange, you send a "HAVE-ALL" message. Then you get a "lazy bitfield", where it turns out the peer is a seed as well, and the connection is closed.

There's also a 'upload_only': 1, in the handshake, suggesting you aren't interested in downloading anything. If you're not a seed, you can end up in upload-only mode in case a disk write operation fails with ENOSPACE. i.e. if you run out of space, the torrent will be switched into "upload-only" mode. So, it will keep uploading but won't download any more.

In this case the torrent is not paused not set in an error state, but there should be a file_error_alert.

I take it you're not a seed, is that right? Is it possible the torrent was started in seed-mode, and nothing has caused it to check any piece yet, so it keeps thinking it's a seed? I can't think of any other reason why a HAVE-ALL message would be sent.

seed-mode would also explain upload_only in the handshake too.

ichorid commented 4 years ago

@arvidn , the captured exchange was from the moment when the ubuntu torrent was already downloaded (although very slow), but the two remaining downloads were still stuck. So, upload only is fine.

What's suspicious, though, is that there are no attempts to get metainfo for the remaining two downloads from DHT. Do you protocol parser catches DHT connection attempts?

arvidn commented 4 years ago

It doesn't analyze DHT packets at all. But any actual attempt to establish a TCP or uTP connection to a peer should be picked up

ichorid commented 4 years ago

I've compared a healthy capture vs the stuck one. One thing is for sure: when stuck, Tribler sends no DHT requests for the torrents that are stuck in "downloading metadata" state. Instead, it sends DHT requests for some other infohashes. I tried looking up those infohashes, but DHT can't find them.

However, while looking at those infohashes I noticed something strange. There are streaks of infohash requests that all start with the same few bits. Here are the screenshots:

These were all going to another instance of Tribler running in the same local network (my seedbox). Though, there too were some DHT requests going to external servers with the same prefix, so it is not specific to local network Tribler->Tribler DHT activity. The infohashes queried look like this:

03d0f1213f07db9ae14f4933b9cb33528db94513
03d0f294815d38d14487762cb2ebea87ca9843c4
03d0f61269d972c2e27dee834431d82193815e01
03d0f2fbfeacb50e8be80d232a8e56026db8cab0
03d0f5ee78afc9fecad396af3e047758a9e878dd
03d0f1d087595241e85755a0f6932e2fe229d973
03d0f54737d9d932bde8af0f380b65dc96b3dca3
03d0f3f87902892fabecdc7b5388e02d7ee65847

@arvidn , I'm no specialist in DHTs, but to me, it looks very suspicious. It looks more like DHT response than DHT request.

ichorid commented 4 years ago

Ah, I see from the source code! Libtorrent uses DHT obfuscation, it will only start using the real DHT address when it gets closer to the target. So this is not a bug, but a feature :grin:

ichorid commented 4 years ago

Still, this leaves us with a problem: the capture contained no queries for infohashes of the stuck torrents. 03d0f does not correlate with the beginning of any of the stuck torrents' infohashes.

@arvidn , I still can't understand, why it searches for the same prefix as the sender (our own) id?

arvidn commented 4 years ago

Still, this leaves us with a problem: the capture contained no queries for infohashes of the stuck torrents. 03d0f does not correlate with the beginning of any of the stuck torrents' infohashes.

Right, assuming your capture is longer than 15 minutes (which is the default DHT announce interval) there should have been at least one DHT lookup for the info-hash.

There are a few reasons a torrent isn't announcing to the DHT. It can be detected whether it is by looking at torrent_status::announcing_to_dht (docs)

Queuing logic may disable announcing to DHT (docs)

@arvidn , I still can't understand, why it searches for the same prefix as the sender (our own) id?

That's part of normal DHT routing table maintanence, to make sure it discovers new DHT nodes that are close to its own ID.

ichorid commented 4 years ago

My capture was only 3 minutes long, which makes it useless for analysis purposes...

Queuing logic may disable announcing to DHT

We don't use the "automanage" feature in Tribler. Does DHT-limiting queuing logic still applies in this case?

arvidn commented 4 years ago

no, I'm pretty sure if the auto_managed flag is not set, the DHT would never be disabled automatically.

There's still a manual option to disable it (docs) (docs)

ichorid commented 4 years ago

Well, I'll try to reproduce it once again (though it depends on pure chance :four_leaf_clover: ). When I do, I'll do a long packet capture.

ichorid commented 4 years ago

Just reproduced it again!

To semi-reliably reproduce it, one must start Tribler without any subscribed channels (or with 1-2 channels), subscribe to top 10-20 channels, and then, while the channels are downloading, start trying downloading other highly seeded torrents by infohash. In my case, the 1st Ubuntu torrent started fine, so I removed it and tried the second one. The second got stuck as the last time.

I'll do a long capture of UDP packets to look at DHT activity.

arvidn commented 4 years ago

I think dumping torrent_status might be useful as well

ichorid commented 4 years ago

I managed to reproduce it semi-reliably. When the problem arises, it affects all the torrents I try to download. In a normal session, the added torrent will find a lot of peers almost immediately (10-15 seconds max). In a bugged session, the torrent is stuck for 3-6 minutes in "waiting for metadata". After it downloads the metadata, it will connect no more than 1-2 peers and proceed to leech from them. This can continue for 2-10 minutes. After that, the torrent almost instantly finds a lot of peers and leeches very fast as normal. Adding other torrents during the same run results in the same scenario, no matter the torrent.

I started to print torrent_status as suggested. Here is a sample from the moment when the torrent is still stuck waiting for metadata:

Stuck waiting for metadata

``` {'next_announce': datetime.timedelta(0) 'state': libtorrent.states.downloading_metadata 'errc': 'error_file': -1 'save_path': '/home/vader/Downloads/TriblerDownloads' 'name': 'ubuntu-16.04.6-desktop-i386.iso' 'torrent_file': 'current_tracker': '' 'total_download': 0 'total_upload': 0 'total_payload_download': 0 'total_payload_upload': 0 'total_failed_bytes': 0 'total_redundant_bytes': 0 'total_done': 0 'total_wanted_done': 0 'total_wanted': 0 'all_time_upload': 0 'all_time_download': 0 'added_time': 1591449605 'last_seen_complete': 0 'storage_mode': libtorrent.storage_mode_t.storage_mode_sparse 'progress': 0.0 'progress_ppm': 0 'queue_position': 4 'download_rate': 0 'upload_rate': 0 'download_payload_rate': 0 'upload_payload_rate': 0 'num_seeds': 0 'num_peers': 0 'num_complete': -1 'num_incomplete': -1 'list_seeds': 0 'list_peers': 0 'connect_candidates': 0 'num_pieces': 0 'distributed_full_copies': 0 'distributed_fraction': 0 'distributed_copies': 0.0 'block_size': 0 'num_uploads': 0 'num_connections': 0 'uploads_limit': -1 'connections_limit': -1 'up_bandwidth_queue': 0 'down_bandwidth_queue': 0 'seed_rank': 0 'need_save_resume': True 'is_seeding': False 'is_finished': False 'has_metadata': False 'has_incoming': False 'moving_storage': False 'announcing_to_trackers': True 'announcing_to_lsd': True 'announcing_to_dht': True 'info_hash': 'last_upload': None 'last_download': None 'active_duration': datetime.timedelta(seconds=21) 'finished_duration': datetime.timedelta(0) 'seeding_duration': datetime.timedelta(0) 'flags': 0} ```

And here is the one from the moment where it already leeches from 2 peers:

2 peers

``` {'next_announce': datetime.timedelta(0) 'state': libtorrent.states.downloading 'errc': 'error_file': -1 'save_path': '/home/vader/Downloads/TriblerDownloads' 'name': 'ubuntu-16.04.6-desktop-i386.iso' 'torrent_file': 'current_tracker': '' 'total_download': 50623154 'total_upload': 60620 'total_payload_download': 50516674 'total_payload_upload': 0 'total_failed_bytes': 0 'total_redundant_bytes': 15244 'total_done': 50495488 'total_wanted_done': 50495488 'total_wanted': 1677721600 'all_time_upload': 0 'all_time_download': 49797507 'added_time': 1591449605 'last_seen_complete': 1591449716 'storage_mode': libtorrent.storage_mode_t.storage_mode_sparse 'progress': 0.030097000300884247 'progress_ppm': 30097 'queue_position': 0 'download_rate': 1858086 'upload_rate': 57972 'download_payload_rate': 1800824 'upload_payload_rate': 0 'num_seeds': 1 'num_peers': 1 'num_complete': -1 'num_incomplete': -1 'list_seeds': 3 'list_peers': 3 'connect_candidates': 1 'num_pieces': 96 'distributed_full_copies': 1 'distributed_fraction': 30 'distributed_copies': 1.0299999713897705 'block_size': 16384 'num_uploads': 0 'num_connections': 2 'uploads_limit': -1 'connections_limit': -1 'up_bandwidth_queue': 0 'down_bandwidth_queue': 0 'seed_rank': 0 'need_save_resume': True 'is_seeding': False 'is_finished': False 'has_metadata': True 'has_incoming': True 'moving_storage': False 'announcing_to_trackers': True 'announcing_to_lsd': True 'announcing_to_dht': True 'info_hash': 'last_upload': None 'last_download': datetime.datetime(2020 6 6 15 21 56) 'active_duration': datetime.timedelta(seconds=110) 'finished_duration': datetime.timedelta(0) 'seeding_duration': datetime.timedelta(0) 'flags': 0} ```

Note there is "queue_position": 4 for the stuck state. We do not use torrent automanagement, but the same session runs a dozen of background DHT checks for Channel torrents and there are two dozens of tiny Channel torrents already downloaded and seeded.

@arvidn , does Libtorrent uses some kind of queueing even in case automanagement is not used. We do not touch the auto_managed flag at all.

ichorid commented 4 years ago

Libtorrent uses automanagement by default :face_with_head_bandage:

In [1]: import libtorrent as lt                                                                                                                                                                                    

In [2]: lt.add_torrent_params_flags_t.flag_auto_managed & lt.add_torrent_params_flags_t.default_flags                                                                                                              
Out[2]: 32

And we never touch that. @egbertbouman , do we disable automanage explicitly?

EDIT: I figured out we construct atp flags ourselves and pass theme to async_add_torrent. If we do not add a flag for automanagement, the torrent should start without one, right?

Why the queue:4 then?

ichorid commented 4 years ago

I added a bunch of torrents simultaneously and printed queue_position and flags for them. flags is 0 everywhere. However, queue_position is -1 for those already seeding and 1 and more for the others.

Either setting auto_managed flag does not work through Python, or Libtorrent assigns queue_position even if the torrent is not managed (and that would be very confusing). @arvidn ?

ichorid commented 4 years ago

Having read a little bit through Libtorrent source I can say that it indeed uses queue_position internally when ordering/prioritising connections even when there are no auto-managed torrents. If there is some kind of internal limit for DHT sessions and/or external peer connections or some rare condition that can result in a bunch of torrents being constantly scheduled to the front of the queue, that would explain the problem. One thing that could be related is that we download a lot of small (~ 1kb size) torrents in the same session (Channels subsystem torrents). That could trigger some kind of "endgame" mode that will force the things to get priority over the recently added torrents.

arvidn commented 4 years ago

yeah, queue position is always assigned, regardless of whether it's auto managed or not. it's possible that the dht_limit would affect torrents that don't have the auto manged flag set (but it seems like it shouldn't). Stil though, the torrent_status you posted suggests that the DHT is still enabled for the torrent, so it might be some other issue, where it's not announced to the DHT for some reason.

ichorid commented 4 years ago

I noticed that stuck torrents are announced to DHT after about 300 seconds. auto_scrape_min_interval is exactly 300 seconds.

ichorid commented 4 years ago

If a torrent is not auto-managed, how often is it announced to DHT? I mean, if I add a torrent by infohash, what's the schedule of DHT checks for it? Does adding a non-auto-managed torrent interfere in some way with DHT scrapes schedule of other non-auto-managed torrents?

arvidn commented 4 years ago

The convention is to announce a torrent to the DHT every 15 minutes. The auto-managed mechanism only turns things on and off. It doesn't affect the DHT announce frequency.

ichorid commented 4 years ago

It seems it has something to do with C-level async stuff. I can semi-reliably trigger it by adding 3-4 torrents in a couple of seconds just after starting Tribler (with Channels torrents restoring in the same session). One of the added torrents will be almost guaranteed to become stuck in "downloading metadata" for 5-??? minutes.

However, if I change the actual command that Tribler uses to add torrents from async_add_torrent to the plain add_torrent, I can't reproduce the problem.

@arvidn , could it be that the logic behind prioritize_dht can't handle bursts of dozens of asynchronously added torrents?

arvidn commented 4 years ago

One reason I can think of this would happen is that if you add the torrents while the DHT is still being bootstrapped, it's possible that the DHT announces for the torrents will fail, because the routing table is empty, or virtually empty.

There's (currently) no logic to detect whether a DHT announce fails and retry again soon. The only re-announce mechanism is the periodic one, every 15 minutes. So, if you wait 15 minutes, do the stuck torrents get unstuck?

ichorid commented 4 years ago

So, if you wait 15 minutes, do the stuck torrents get unstuck?

Yes, they do. Sometimes it happens after 10 minutes, sometimes it takes more than an hour (i.e. after adding 3 torrents, one torrent can get unstuck in 15 minutes, the remaining two can be still stuck for hours. ). Waiting for hours happens rarely, though.

ichorid commented 4 years ago

Yep, it definitely only happens when using async_add_torrent. The torrents get stuck for a few minutes, then slowly get unstuck.

ichorid commented 4 years ago

When using add_torrent directly though, sometimes a very interesting thing happens: Tribler downloads endpoint becomes completely stuck, all downloads in the "waiting for metadata" state. The console shows that the usual requests to it do not trigger anything. Then, there come HTTP timeouts... Wireshark capture shows normal uTP activity though, for existing torrents. After 5-10 minutes, everything becomes unstuck. Tribler instantly shows all downloads as having proper metadata, just started downloading.

@arvidn , is there something that could block in add_torrent for several minutes, though does not affect already existing (seeding) torrents?

arvidn commented 4 years ago

does the call itself, to add_torrent() block during this time? (it would be surprising if it would block for that long, it would suggest that something fishy is going on with the main network thread in libtorrent). It's only supposed to block until the main thread has received the request, added the torrent and produced a torrent_handle that's then returned. So, the call should take longer than async_add_torrent().

On the libtorrent side, the exact same thing happens in both cases, the only difference is that the non-async version waits on the client side, blocking the calling thread.

If you see a difference in behavior between the two, I would think it's something triggered by timing.

Does this problem seem to be caused by torrents not announcing to the DHT? (or at least not announcing in a timely manner).

Does Tribler change any of the dht_ settings?

In order to understand more of what's going on, could you try changing the dht_announce_interval setting to something very low, to make torrents announce to the DHT more frequently?

By default it's set to (15 * 60) seconds.

This is the function that's called when it's time to announce another torrent to the DHT.

It tries to stagger the DHT announce of all torrents evenly distributed across the 15 minutes, by setting up a timer to fire every (15m / num_torrents).

int delay = std::max(m_settings.get_int(settings_pack::dht_announce_interval)
    / std::max(int(m_torrents.size()), 1), 1);

There's a risk that if this is called when there's only 1 torrent, the timer is set up for 15 minutes, then if you add more torrents, maybe the delay won't be recomputed for another 15 minutes (I don't know that this is the case, but it seems like a risk).

The next torrent to announce to the DHT is determined by an iterator m_next_dht_torrent, which rotates over all torrents.

ichorid commented 4 years ago

Using add_torrent nonetheless results in stuck torrents, although with smaller probability. For instance, I added 9 highly popular torrents. 6 of them proceeded from "waiting for metadata" to "downloading" almost immediately. 3 got stuck (lets call them A,B,C). Then, I waited for a couple of minutes just to be sure. Then I stopped the 6 downloading torrents. The 3 ABC stuck torrents remained stuck. Then I waited for another couple of minutes. Then I stopped one of the stuck torrents (A), and started it right away. It got unstuck immediately. The 2 remaining stuck torrents (B and C) were still stuck. After the A finished downloading, B got unstuck and started to download. After B finished downloading, C remains stuck for more than 40 minutes...

EDIT: C was stuck for a good reason: there are no peers. Checked with another client.

arvidn commented 4 years ago

getting unstuck when you stop and start is consistent with the first DHT announce failing. stopping and starting will trigger another DHT announce right away.

Starting immediately as another torrent finishes suggest that there might be some auto-managed aspect, that it was paused and got unpaused.

the torrent_status you posted earlier suggests that the torrent has not found any peers, anywhere:

'list_seeds': 0
'list_peers': 0
'connect_candidates': 0

ichorid commented 4 years ago

does the call itself, to add_torrent() block during this time? I tried to simulate the blocking with main thread sleep for both adding and reporting torrents. It didn't resemble the behaviour I described above at all. So, if there was a problem, it was more akin to missing handles or something like that. A mystery, indeed! :ghost: :man_shrugging: :ghost:

Tribler does not alter any dht_ setting of Libtorrent.

To sum it up, for now, the problem seems to be that adding a bunch of torrents in a short period of time results in some torrents being stuck in "waiting for metadata" for 300-600 seconds. Stopping any of the other torrents seems not to affect the stuck torrent. While restarting the stuck torrent sometimes can make it unstuck.

Another trial: I disabled the network connection and started a bunch of torrents, then closed Tribler. Then I enabled the network connection back and opened Tribler. Out of the 7 added torrents only a single one started, the others got stuck in such a bad way, that even stopping-starting them did not help. Then I restarted Tribler again and voila, every torrent got started as normal. I tried to reproduce the same trick with network connectivity, a couple of times, but everything worked fine.

As you say, the problem is highly timing-dependent.

arvidn commented 4 years ago

you can see which DHT announces are in-flight by calling torrent_handle::post_dht_stats(). It will post a dht_stats_alert which has active_requests and routing_table fields. It's not the full table, just counters for number of nodes per bucket.

The active_requests will show both get_peer lookups as part of announcing torrents, as well as bootstrap when starting up, and other get_peer lookups as part of normal routing table maintenance.

ichorid commented 4 years ago

I'll try to get more DHT stats using this calls later. Meanwhile, I traced the faulty DHT exchange by hand. The pattern is as follows: Libtorrent asks a DHT peer for some non-stuck infohash (N), the peer responds with a proper DHT answer packet. After 2-15 seconds Libtorrent asks the same peer for the stuck infohash (S). This time, the peer remains silent.

This happened 8 times during the first 32 seconds of the Libtorrent session.

I waited for a few minutes, restarted the stuck infohash (S) and it was able to instantly get the necessary DHT responses from other peers (e.g. peers not including those first 9 peers that became silent).

It looks like we're hitting into some kind of DHT requests rate-limiting mechanism employed by other BitTorrent clients. :bricks:

I got a full capture saved, BTW.

ichorid commented 4 years ago

@arvidn , is it possible to wait upon a Libtorrent start until you get a certain number of peers? If the problem is really about DHT flooding control, the easiest solution for us would be to just wait a few seconds until we get enough peers.

arvidn commented 4 years ago

there is an alert for when DHT bootstrapping completes. It might be a good idea to wait for that before starting any torrent. Maybe the simplest solution would be to pause the session and then resume it once bootstrapping completes.

arvidn commented 4 years ago

http://libtorrent.org/reference-Alerts.html#dht_bootstrap_alert

ichorid commented 4 years ago

I merged the fix for the DHT flood problem.

@arvidn , thank you very much for your help! Hope you got something out of this too.

synctext commented 4 years ago

Is it time almost for RC6 or more @egbertbouman stuff coming?

ichorid commented 4 years ago

We're waiting for Egbert. His stuff is important. He will probably finish today or tomorrow.

egbertbouman commented 4 years ago

@synctext When https://github.com/Tribler/tribler/pull/5357 is merged we should be good to go. More stuff is still coming, but that will have to wait until after RC6.

Tribler / tribler

all downloads regularly go to 0kb/sec #5319