amzn / amzn-drivers

Official AWS drivers repository for Elastic Network Adapter (ENA) and Elastic Fabric Adapter (EFA)
455 stars 175 forks source link

ENA keeps restarting #134

Closed arainero closed 4 years ago

arainero commented 4 years ago

Hello,

I have a T3.a instance that is experiencing dropped RX packets due to ENA resetting. I don't know what's causing ENA to reset constantly and I was hoping you could shed some light on the matter. I have multiple servers based off of the same AMI so I don't know why this one is having these issues. The server is heavy on UDP traffic compared to TCP if that helps. The instance ID is "i-0c068f87c4161e736".

When ENA resets the following logs are generated in /var/log/messages and jounralctl

Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 918.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 919.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 920.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 921.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 922.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 923.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 924.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 925.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 926.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 927.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 928.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 929.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 930.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 931.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 932.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 933.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 934.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 935.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 936.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 937.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 938.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 939.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 940.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 941.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 942.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 943.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 944.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 945.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 946.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 947.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 948.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 949.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 950.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 951.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 952.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 953.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 954.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 955.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 956.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 957.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 958.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 959.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 960.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 961.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 962.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 963.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 964.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 965.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 966.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 967.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 968.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 969.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 970.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 971.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 972.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 973.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 974.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 975.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 976.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 977.
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Keep alive watchdog timeout.
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Trigger reset is on
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: tx_timeout: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: suspend: 0
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 944.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 945.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 946.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 947.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 948.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 949.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 950.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 951.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 952.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 953.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 954.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 955.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 956.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 957.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 958.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 959.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 960.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 961.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 962.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 963.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 964.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 965.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 966.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 967.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 968.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 969.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 970.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 971.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 972.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 973.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 974.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 975.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 976.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 977.
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Keep alive watchdog timeout.
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Trigger reset is on
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: tx_timeout: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: suspend: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: resume: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: wd_expired: 1
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_up: 1
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_down: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: admin_q_pause: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_cnt: 2112023
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bytes: 533808489
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_stop: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_wakeup: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize_failed: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_napi_comp: 4403324
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_tx_poll: 4403326
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_doorbells: 2089811
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_prepare_ctx_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_missed_tx: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_cnt: 2371476
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bytes: 512952189
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_refil_partial: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_csum: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_page_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_skb_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_desc_num: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_rx_copybreak_pkt: 142811
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_empty_rx_ring: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_cnt: 1861158
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bytes: 1698200576
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_stop: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_wakeup: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize_failed: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_napi_comp: 3672075
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_tx_poll: 3672121
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_doorbells: 1743809
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_prepare_ctx_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_missed_tx: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_cnt: 1913582
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bytes: 412499776
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_refil_partial: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_csum: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_page_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_skb_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_desc_num: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_rx_copybreak_pkt: 53095
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_empty_rx_ring: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_cnt: 2405368
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bytes: 547235419
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_stop: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_wakeup: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize_failed: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_napi_comp: 4279065
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_tx_poll: 4279175
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_doorbells: 2382262
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_prepare_ctx_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_missed_tx: 60
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_cnt: 1953528
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bytes: 430232614
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_refil_partial: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_csum: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_page_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_skb_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_desc_num: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_rx_copybreak_pkt: 72039
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_empty_rx_ring: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_cnt: 2696836
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bytes: 731741920
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_stop: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_wakeup: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize_failed: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_napi_comp: 5190044
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_tx_poll: 5190044
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_doorbells: 2682972
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_prepare_ctx_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_missed_tx: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_cnt: 2556581
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bytes: 701259477
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_refil_partial: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_csum: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_page_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_skb_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_desc_num: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_rx_copybreak_pkt: 218516
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_empty_rx_ring: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_aborted_cmd: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_submitted_cmd: 28
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_completed_cmd: 28
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_out_of_space: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_no_completion: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 0 idx 0x217
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 1 idx 0x226
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 2 idx 0x0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 3 idx 0x284
Jul 14 10:10:22 myserver.hostname kernel: ena: ena device version: 0.10
Jul 14 10:10:22 myserver.hostname kernel: ena: ena controller version: 0.0.1 implementation version 1
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0: irq 24 for MSI/MSI-X
Jul 14 10:10:22 drb kernel: ena 0000:00:05.0: Device reset completed successfully

modinfo gives the following:

filename:       /lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/drivers/net/ethernet/amazon/ena/ena.ko.xz
version:        1.5.0K
license:        GPL
description:    Elastic Network Adapter (ENA)
author:         Amazon.com, Inc. or its affiliates
retpoline:      Y
rhelversion:    7.6
srcversion:     1B9931F07C26733BA8D4F94
alias:          pci:v00001D0Fd0000EC21sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd0000EC20sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00001EC2sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00000EC2sv*sd*bc*sc*i*
depends:
intree:         Y
vermagic:       3.10.0-957.21.3.el7.x86_64 SMP mod_unload modversions
signer:         CentOS Linux kernel signing key
sig_key:        1E:5F:1D:87:70:4B:F3:38:01:2F:A2:B0:FE:16:94:59:97:B3:31:27
sig_hashalgo:   sha256
parm:           debug:Debug level (0=none,...,16=all) (int)
nafeabshara commented 4 years ago

Sorry you are hitting this issue, and thanks for sharing instance-id and details

We are looking at things on our side

On Jul 14, 2020, at 9:31 AM, arainero notifications@github.com wrote:

Hello,

I have a T3.a instance that is experiencing dropped RX packets due to ENA resetting. I don't know what's causing ENA to reset constantly and I was hoping you could shed some light on the matter. I have multiple servers based off of the same AMI so I don't know why this one is having these issues. The server is heavy on UDP traffic compared to TCP if that helps. The instance ID is "i-0c068f87c4161e736".

When ENA resets the following logs are generated in /var/log/messages and jounralctl

Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 918. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 919. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 920. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 921. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 922. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 923. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 924. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 925. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 926. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 927. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 928. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 929. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 930. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 931. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 932. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 933. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 934. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 935. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 936. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 937. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 938. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 939. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 940. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 941. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 942. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 943. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 944. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 945. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 946. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 947. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 948. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 949. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 950. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 951. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 952. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 953. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 954. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 955. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 956. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 957. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 958. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 959. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 960. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 961. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 962. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 963. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 964. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 965. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 966. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 967. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 968. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 969. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 970. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 971. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 972. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 973. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 974. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 975. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 976. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 977. Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Keep alive watchdog timeout. Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Trigger reset is on Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: tx_timeout: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: suspend: 0 Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 944. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 945. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 946. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 947. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 948. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 949. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 950. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 951. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 952. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 953. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 954. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 955. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 956. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 957. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 958. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 959. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 960. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 961. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 962. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 963. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 964. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 965. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 966. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 967. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 968. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 969. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 970. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 971. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 972. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 973. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 974. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 975. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 976. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 977. Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Keep alive watchdog timeout. Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Trigger reset is on Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: tx_timeout: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: suspend: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: resume: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: wd_expired: 1 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_up: 1 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_down: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: admin_q_pause: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_cnt: 2112023 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bytes: 533808489 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_stop: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_wakeup: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize_failed: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_napi_comp: 4403324 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_tx_poll: 4403326 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_doorbells: 2089811 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_prepare_ctx_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_missed_tx: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_cnt: 2371476 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bytes: 512952189 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_refil_partial: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_csum: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_page_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_skb_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_desc_num: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_rx_copybreak_pkt: 142811 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_empty_rx_ring: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_cnt: 1861158 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bytes: 1698200576 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_stop: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_wakeup: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize_failed: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_napi_comp: 3672075 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_tx_poll: 3672121 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_doorbells: 1743809 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_prepare_ctx_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_missed_tx: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_cnt: 1913582 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bytes: 412499776 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_refil_partial: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_csum: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_page_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_skb_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_desc_num: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_rx_copybreak_pkt: 53095 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_empty_rx_ring: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_cnt: 2405368 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bytes: 547235419 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_stop: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_wakeup: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize_failed: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_napi_comp: 4279065 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_tx_poll: 4279175 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_doorbells: 2382262 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_prepare_ctx_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_missed_tx: 60 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_cnt: 1953528 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bytes: 430232614 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_refil_partial: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_csum: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_page_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_skb_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_desc_num: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_rx_copybreak_pkt: 72039 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_empty_rx_ring: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_cnt: 2696836 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bytes: 731741920 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_stop: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_wakeup: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize_failed: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_napi_comp: 5190044 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_tx_poll: 5190044 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_doorbells: 2682972 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_prepare_ctx_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_missed_tx: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_cnt: 2556581 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bytes: 701259477 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_refil_partial: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_csum: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_page_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_skb_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_desc_num: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_rx_copybreak_pkt: 218516 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_empty_rx_ring: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_aborted_cmd: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_submitted_cmd: 28 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_completed_cmd: 28 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_out_of_space: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_no_completion: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 0 idx 0x217 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 1 idx 0x226 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 2 idx 0x0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 3 idx 0x284 Jul 14 10:10:22 myserver.hostname kernel: ena: ena device version: 0.10 Jul 14 10:10:22 myserver.hostname kernel: ena: ena controller version: 0.0.1 implementation version 1 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0: irq 24 for MSI/MSI-X Jul 14 10:10:22 drb kernel: ena 0000:00:05.0: Device reset completed successfully

modinfo gives the following:

filename: /lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/drivers/net/ethernet/amazon/ena/ena.ko.xz version: 1.5.0K license: GPL description: Elastic Network Adapter (ENA) author: Amazon.com, Inc. or its affiliates retpoline: Y rhelversion: 7.6 srcversion: 1B9931F07C26733BA8D4F94 alias: pci:v00001D0Fd0000EC21svsdbcsci alias: pci:v00001D0Fd0000EC20svsdbcsci alias: pci:v00001D0Fd00001EC2svsdbcsci alias: pci:v00001D0Fd00000EC2svsdbcsci depends: intree: Y vermagic: 3.10.0-957.21.3.el7.x86_64 SMP mod_unload modversions signer: CentOS Linux kernel signing key sig_key: 1E:5F:1D:87:70:4B:F3:38:01:2F:A2:B0:FE:16:94:59:97:B3:31:27 sig_hashalgo: sha256 parm: debug:Debug level (0=none,...,16=all) (int)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/amzn/amzn-drivers/issues/134, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE2BJJK5IPQWGW6IGABG4JLR3SB63ANCNFSM4OZV73HA.

arainero commented 4 years ago

This happened again with a little more information that I noticed. Something that stands out to me is "The number of lost tx completions is above the threshold (248 > 128). Reset the device"

According to https://nxmnpg.lemoda.net/4/ena

"Packet was pushed to the NIC but not sent within given time limit; it may be caused by hang of the IO queue."

I want to investigate the IO queue mentioned here, do you have any advice for that or what to look at / look for?

Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 327.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 328.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 329.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 330.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 331.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 332.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 333.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 334.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 335.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 336.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 337.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 338.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 339.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 340.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 341.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 342.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 343.
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: The number of lost tx completions is above the threshold (248 > 128). Reset the device
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: Trigger reset is on
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: tx_timeout: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: suspend: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: resume: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: wd_expired: 1
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_up: 3
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_down: 2
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: admin_q_pause: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_cnt: 4137606
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bytes: 1014469330
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_stop: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_wakeup: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize_failed: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_napi_comp: 8802215
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_tx_poll: 8802229
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_doorbells: 4113070
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_prepare_ctx_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_missed_tx: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_cnt: 4783684
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bytes: 1023117478
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_refil_partial: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_csum: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_page_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_skb_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_desc_num: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_rx_copybreak_pkt: 182682
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_empty_rx_ring: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_cnt: 5194918
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bytes: 4147099565
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_stop: 1
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_wakeup: 1
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize_failed: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_napi_comp: 8925222
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_tx_poll: 8925546
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_doorbells: 4938995
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_prepare_ctx_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_missed_tx: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_cnt: 3985053
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bytes: 852301798
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_refil_partial: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_csum: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_page_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_skb_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_desc_num: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_rx_copybreak_pkt: 118590
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_empty_rx_ring: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_cnt: 4693359
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bytes: 1134350239
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_stop: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_wakeup: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize_failed: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_napi_comp: 8894540
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_tx_poll: 8894686
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_doorbells: 4656434
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_prepare_ctx_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_missed_tx: 248
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_cnt: 4374717
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bytes: 929391616
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_refil_partial: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_csum: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_page_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_skb_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_desc_num: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_rx_copybreak_pkt: 189996
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_empty_rx_ring: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_cnt: 4377595
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bytes: 1100045461
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_stop: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_wakeup: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize_failed: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_napi_comp: 8944166
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_tx_poll: 8944180
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_doorbells: 4348263
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_prepare_ctx_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_missed_tx: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_cnt: 4678466
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bytes: 1007127711
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_refil_partial: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_csum: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_page_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_skb_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_desc_num: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_rx_copybreak_pkt: 118426
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_empty_rx_ring: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_aborted_cmd: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_submitted_cmd: 78
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_completed_cmd: 78
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_out_of_space: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_no_completion: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 2 idx 0x60
Jul 14 14:17:34 myserver.hostname kernel: ena: ena device version: 0.10
Jul 14 14:17:34 myserver.hostname kernel: ena: ena controller version: 0.0.1 implementation version 1
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0: irq 24 for MSI/MSI-X
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0: irq 25 for MSI/MSI-X
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0: irq 26 for MSI/MSI-X
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0: irq 27 for MSI/MSI-X
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0: irq 28 for MSI/MSI-X
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0: Device reset completed successfully
Jul 14 14:17:40 myserver.hostname postfix/pickup[7639]: 8EFAA40CB90D: uid=990 from=<netdata>
Jul 14 14:17:40 myserver.hostname postfix/cleanup[28287]: 8EFAA40CB90D: message-id=<20200714181740.8EFAA40CB90D@myserver.hostname>
Jul 14 14:17:40 myserver.hostname postfix/qmgr[2091]: 8EFAA40CB90D: from=<netdata@myserver.hostname>, size=11531, nrcpt=1 (queue active)
Jul 14 14:17:40 myserver.hostname postfix/local[28290]: 8EFAA40CB90D: to=<root@myserver.hostname>, orig_to=<root>, relay=local, delay=0.01, delays=0/0/0/0, dsn=2.0.0, status=sent (delivered to mailbox)
Jul 14 14:17:40 myserver.hostname postfix/qmgr[2091]: 8EFAA40CB90D: removed
Jul 14 14:17:50 myserver.hostname kernel: ------------[ cut here ]------------
Jul 14 14:17:50 myserver.hostname kernel: WARNING: CPU: 0 PID: 9896 at net/sched/sch_generic.c:356 dev_watchdog+0x248/0x260
Jul 14 14:17:50 myserver.hostname kernel: NETDEV WATCHDOG: eth0 (ena): transmit queue 3 timed out
Jul 14 14:17:50 myserver.hostname kernel: Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_multiport xt_conntrack nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables binfmt_misc bluetooth rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs
Jul 14 14:17:50 myserver.hostname kernel: CPU: 0 PID: 9896 Comm: asterisk Kdump: loaded Tainted: G               ------------ T 3.10.0-957.21.3.el7.x86_64 #1
Jul 14 14:17:50 myserver.hostname kernel: Hardware name: Amazon EC2 t3a.xlarge/, BIOS 1.0 10/16/2017
Jul 14 14:17:50 myserver.hostname kernel: Call Trace:
Jul 14 14:17:50 myserver.hostname kernel:  <IRQ>  [<ffffffff9bf63107>] dump_stack+0x19/0x1b
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b897768>] __warn+0xd8/0x100
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b8977ef>] warn_slowpath_fmt+0x5f/0x80
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9be66c38>] dev_watchdog+0x248/0x260
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9be669f0>] ? dev_deactivate_queue.constprop.26+0x60/0x60
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b8a80c8>] call_timer_fn+0x38/0x110
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9be669f0>] ? dev_deactivate_queue.constprop.26+0x60/0x60
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b8aa52d>] run_timer_softirq+0x24d/0x300
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b8a1075>] __do_softirq+0xf5/0x280
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf7932c>] call_softirq+0x1c/0x30
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b82e675>] do_softirq+0x65/0xa0
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b8a13f5>] irq_exit+0x105/0x110
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf7a6e8>] smp_apic_timer_interrupt+0x48/0x60
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf76df2>] apic_timer_interrupt+0x162/0x170
Jul 14 14:17:50 myserver.hostname kernel:  <EOI>  [<ffffffff9b912142>] ? __pv_queued_spin_lock_slowpath+0xf2/0x2e0
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b9122ee>] ? __pv_queued_spin_lock_slowpath+0x29e/0x2e0
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf5d28b>] queued_spin_lock_slowpath+0xb/0xf
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf6b760>] _raw_spin_lock+0x20/0x30
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b9e95b0>] handle_pte_fault+0x160/0xd10
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b8c5950>] ? hrtimer_get_res+0x50/0x50
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b9ec27d>] handle_mm_fault+0x39d/0x9b0
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf70603>] __do_page_fault+0x203/0x4f0
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf709d6>] trace_do_page_fault+0x56/0x150
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf6ff62>] do_async_page_fault+0x22/0xf0
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf6c798>] async_page_fault+0x28/0x30
Jul 14 14:17:50 myserver.hostname kernel: ---[ end trace 33bb31ed0dc8b342 ]---
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: Transmit time out
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: Trigger reset is on
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: tx_timeout: 1
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: suspend: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: resume: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: wd_expired: 1
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_up: 4
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_down: 3
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: admin_q_pause: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_cnt: 4333
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bytes: 1060807
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_stop: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_wakeup: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize_failed: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_napi_comp: 6968
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_tx_poll: 6971
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_doorbells: 4103
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_prepare_ctx_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_missed_tx: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_cnt: 3265
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bytes: 693233
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_refil_partial: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_csum: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_page_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_skb_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_desc_num: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_rx_copybreak_pkt: 195
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_empty_rx_ring: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_cnt: 5849
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bytes: 4675602
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_stop: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_wakeup: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize_failed: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_napi_comp: 10619
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_tx_poll: 10623
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_doorbells: 4774
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_prepare_ctx_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_missed_tx: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_cnt: 6127
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bytes: 1321486
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_refil_partial: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_csum: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_page_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_skb_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_desc_num: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_rx_copybreak_pkt: 148
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_empty_rx_ring: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_cnt: 5435
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bytes: 1210766
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_stop: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_wakeup: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize_failed: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_napi_comp: 14605
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_tx_poll: 14615
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_doorbells: 5080
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_prepare_ctx_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_missed_tx: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_cnt: 11158
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bytes: 2387063
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_refil_partial: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_csum: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_page_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_skb_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_desc_num: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_rx_copybreak_pkt: 181
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_empty_rx_ring: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_cnt: 5420
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bytes: 1335484
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_stop: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_wakeup: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize_failed: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_napi_comp: 8833
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_tx_poll: 8834
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_doorbells: 5050
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_prepare_ctx_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_missed_tx: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_cnt: 4056
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bytes: 890160
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_refil_partial: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_csum: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_page_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_skb_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_desc_num: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_rx_copybreak_pkt: 142
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_empty_rx_ring: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_aborted_cmd: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_submitted_cmd: 103
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_completed_cmd: 103
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_out_of_space: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_no_completion: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 0 idx 0xed
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 1 idx 0x0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 2 idx 0x13b
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 3 idx 0x123
Jul 14 14:17:50 myserver.hostname kernel: ena: ena device version: 0.10
Jul 14 14:17:50 myserver.hostname kernel: ena: ena controller version: 0.0.1 implementation version 1
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0: irq 24 for MSI/MSI-X
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0: irq 25 for MSI/MSI-X
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0: irq 26 for MSI/MSI-X
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0: irq 27 for MSI/MSI-X
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0: irq 28 for MSI/MSI-X
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0: Device reset completed successfully
Jul 14 14:17:50 myserver.hostname postfix/pickup[7639]: 71BE440CB90D: uid=990 from=<netdata>
Jul 14 14:17:50 myserver.hostname postfix/cleanup[28287]: 71BE440CB90D: message-id=<20200714181750.71BE440CB90D@myserver.hostname>
Jul 14 14:17:50 myserver.hostname postfix/qmgr[2091]: 71BE440CB90D: from=<netdata@myserver.hostname>, size=11765, nrcpt=1 (queue active)
Jul 14 14:17:50 myserver.hostname postfix/local[28290]: 71BE440CB90D: to=<root@myserver.hostname>, orig_to=<root>, relay=local, delay=0.01, delays=0/0/0/0, dsn=2.0.0, status=sent (delivered to mailbox)
Jul 14 14:17:50 myserver.hostname postfix/qmgr[2091]: 71BE440CB90D: removed
zorikm commented 4 years ago

Thanks @arainero for additional info. We suspect your instance gets overloaded with processing and CPUs just don't get enough cycles to process network traffic. Both logs indicate that TX packet completions and other events from the device weren't processed timely. Do you see any dmesg messages that point to CPU stalls or lockups? What CPU utilization do you observe?

zorikm commented 4 years ago

Also we strongly recommend to update the driver to the latest version

arainero commented 4 years ago

@zorikm I attached the dmesg output. I don't think it's CPU load related since it doesn't really spike past 50% until ENA gets reset. Then there is a large spike playing catchup.

As for updating the driver, what's the best way to do that? I haven't done that before.

dmesg.txt

zorikm commented 4 years ago

arainero@, could you please reach out directly to me zorik@amazon.com, and we'll guide you. Thanks

druchoo commented 4 years ago

Hi @zorikm, seeing similar errors as well as others with v2.2.9 of driver. Also on t3a instance but have seen on other instance types as well.

$ modinfo ena
filename:       /lib/modules/3.10.0-1127.13.1.el7.x86_64/extra/ena.ko.xz
version:        2.2.9g
license:        GPL
description:    Elastic Network Adapter (ENA)
author:         Amazon.com, Inc. or its affiliates
retpoline:      Y
rhelversion:    7.8
srcversion:     27F5567B9755BE00C8A08B5
alias:          pci:v00001D0Fd0000EC21sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd0000EC20sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00001EC2sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00000EC2sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00000051sv*sd*bc*sc*i*
depends:
vermagic:       3.10.0-1127.13.1.el7.x86_64 SMP mod_unload modversions
parm:           debug:Debug level (0=none,...,16=all) (int)
parm:           rx_queue_size:Rx queue size. The size should be a power of 2. Max value is 8K
 (int)
parm:           force_large_llq_header:Increases maximum supported header size in LLQ mode to 224 bytes, while reducing the maximum TX queue size by half.
 (int)
parm:           num_io_queues:Sets number of RX/TX queues to allocate to device. The maximum value depends on the device and number of online CPUs.

dmesg.txt ena_errors.txt

Upgrading driver and setting vm.min_free_kbytes to 128MB (2x default) initially seemed to correct issues.

$ sysctl vm.min_free_kbytes
vm.min_free_kbytes = 135168

However, after testing new application that added additional network load, the errors are back. All core cpu avg is ~10% and network traffic is ~10/~15Mbps respectively for rx/tx.

AWSNB commented 4 years ago

Andrew

We’ve triaging some failures on our side (server/ena FW) that matches this sighting, ena and ec2 team working on it in high priority and will update you on progress.

Sent from my iPhone

On Jul 19, 2020, at 9:21 AM, Andrew Choo notifications@github.com wrote:

 Hi @zorikm, seeing similar errors as well as others with v2.2.9 of driver. Also on t3a instance but have seen on other instance types as well.

$ modinfo ena filename: /lib/modules/3.10.0-1127.13.1.el7.x86_64/extra/ena.ko.xz version: 2.2.9g license: GPL description: Elastic Network Adapter (ENA) author: Amazon.com, Inc. or its affiliates retpoline: Y rhelversion: 7.8 srcversion: 27F5567B9755BE00C8A08B5 alias: pci:v00001D0Fd0000EC21svsdbcsci alias: pci:v00001D0Fd0000EC20svsdbcsci alias: pci:v00001D0Fd00001EC2svsdbcsci alias: pci:v00001D0Fd00000EC2svsdbcsci alias: pci:v00001D0Fd00000051svsdbcsci* depends: vermagic: 3.10.0-1127.13.1.el7.x86_64 SMP mod_unload modversions parm: debug:Debug level (0=none,...,16=all) (int) parm: rx_queue_size:Rx queue size. The size should be a power of 2. Max value is 8K (int) parm: force_large_llq_header:Increases maximum supported header size in LLQ mode to 224 bytes, while reducing the maximum TX queue size by half. (int) parm: num_io_queues:Sets number of RX/TX queues to allocate to device. The maximum value depends on the device and number of online CPUs. dmesg.txt ena_errors.txt

Upgrading driver and setting vm.min_free_kbytes to 128MB (2x default) initially seemed to correct issues.

$ sysctl vm.min_free_kbytes vm.min_free_kbytes = 135168 However, after testing new application that added additional network load, the errors are back. All core cpu avg is ~10% and network traffic is ~10/~15Mbps respectively for rx/tx.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

I-gor-C commented 4 years ago

@arainero The triaging has finished and fix has been implemented, please see if the issues are resolved

tuapuikia commented 4 years ago

Is this issue only happen on AMD Zen based instance type?

I-gor-C commented 4 years ago

@arainero Can you please indicate if the issue was resolved?

arainero commented 4 years ago

@arainero Can you please indicate if the issue was resolved?

Unfortunately, we had to migrate the problematic server off of AWS before a fix was applied due to the issues happening. I don't have a reliable way to test this now.

ubarar commented 4 years ago

@I-gor-C we switched our machines back to machines with the new ENA drivers, and haven't had this issue over the last week. I think this issue is definitely resolved.

Thanks!

AWSNB commented 4 years ago

@ubarar thanks for confirming, we'll go ahead and close the issue

wcurry commented 3 years ago

I'm having a similar issue: https://github.com/coreos/fedora-coreos-tracker/issues/665

The issue exists in all Fedora coreos versions between 31.20200323.2.0 and the latest FCOS 32.

31.20200323.2.0

filename:       /lib/modules/5.5.10-200.fc31.x86_64/kernel/drivers/net/ethernet/amazon/ena/ena.ko.xz
version:        2.1.0K
license:        GPL
description:    Elastic Network Adapter (ENA)
author:         Amazon.com, Inc. or its affiliates
srcversion:     DAAE6CFC0FC2113B5776480
alias:          pci:v00001D0Fd0000EC21sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd0000EC20sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00001EC2sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00000EC2sv*sd*bc*sc*i*
depends:
retpoline:      Y
intree:         Y
name:           ena
vermagic:       5.5.10-200.fc31.x86_64 SMP mod_unload
sig_id:         PKCS#7
signer:         Fedora kernel signing key
sig_key:        67:90:9D:B2:92:99:F6:87:CC:07:EF:39:B6:7A:EC:9D:E7:E2:A2:60
sig_hashalgo:   sha256
signature:      7D:97:AB:FB:9C:FD:7B:70:E9:C9:3F:39:3B:9A:3A:B7:42:77:41:15:
                60:7B:1D:BD:B6:08:62:DA:64:B6:5E:F7:46:1A:2F:6D:8B:5E:80:2A:
                8F:88:5B:05:1F:AF:2C:B3:53:52:E0:8D:CB:BB:2C:D3:8E:E1:D1:DC:
                90:3C:27:CD:44:9E:7A:4B:14:1E:A9:D8:CA:72:7D:BB:F3:2B:59:85:
                B2:BB:48:83:75:45:24:28:B1:8F:EC:AA:79:E4:B9:CA:92:2F:09:4E:
                55:2D:28:11:EC:88:80:DC:D3:95:2E:BF:0F:67:59:76:5E:83:05:08:
                2E:CF:B2:FE:3E:C3:7A:3B:15:0F:67:73:14:C1:92:AF:4F:40:F1:51:
                2C:9D:D1:45:2E:F4:BC:59:50:51:B9:BC:AC:02:27:E6:2E:6F:E8:DB:
                48:EF:A8:AA:B8:28:8C:1D:B5:42:A0:73:4F:41:CC:1E:26:6F:21:93:
                50:2A:CF:B6:65:5F:35:29:3D:39:7B:6B:BC:62:0B:6D:2A:7E:7B:65:
                C4:E2:D4:CA:1D:6B:68:B7:B1:CE:94:08:60:37:D2:ED:0B:F2:FC:D1:
                BD:91:CA:30:67:39:1A:E0:64:97:BA:5A:FE:FE:4C:E3:8B:FD:56:52:
                DE:5D:A3:B8:A0:40:D7:46:07:70:4C:B7:8C:CD:CE:5C:F7:52:C2:5F:
                5F:AF:4E:FB:55:17:CF:89:C0:AA:49:38:A7:66:B2:53:74:96:7A:42:
                65:85:7F:18:95:B4:A1:87:31:88:30:57:4C:E8:C9:9D:55:12:87:07:
                35:72:BC:FD:85:C9:F4:85:B6:0A:96:F9:73:BA:F0:22:8A:EA:7B:CF:
                FB:92:B2:BA:82:98:F3:27:83:B3:D4:9F:D2:39:3C:37:90:99:A2:BD:
                43:41:A7:C7:03:76:86:EC:A6:8D:16:F9:25:14:E7:97:34:EC:E5:EE:
                00:E4:19:2A:B8:23:AD:7B:00:54:79:96:BC:00:F5:47:B2:7C:AC:CF:
                6D:26:64:FD:B3:01:15:98:DF:09:B4:F0:09:ED:87:FA:E1:90:0F:98:
                E5:F8:BE:EF:12:32:ED:AC:57:8C:CD:8F:AF:E7:AD:0A:3D:01:8F:EE:
                1D:4C:D1:62:38:59:F4:FF:B1:D3:B7:B7:1F:97:F3:A8:28:0C:A3:3B:
                CC:A5:E7:E6:FD:85:9F:7A:E5:0B:D0:E5:16:4B:D5:72:66:95:8F:7C:
                C1:B4:BA:A7:0C:01:25:39:03:B4:76:18:C6:0B:D1:B8:1B:F5:45:FA:
                5E:B9:78:3F:24:D5:BE:E6:91:59:87:FC:04:4C:3F:BB:57:A3:4B:4C:
                45:89:D2:A2:62:61:5D:A6:D2:95:DF:2A
parm:           debug:Debug level (0=none,...,16=all) (int)
akiyano commented 3 years ago

Hi @wcurry,

Thanks for your report. I'm looking into this issue. Meanwhile could you please contact me via akiyano@amazon.com so that I can get some more details.

Thanks, Arthur

akiyano commented 3 years ago

For the record the fedora issue from the last 2 comments was handled in https://github.com/amzn/amzn-drivers/issues/147