Mellanox / mstflint

Mstflint - an open source version of MFT (Mellanox Firmware Tools)
Other
176 stars 92 forks source link

mstfwreset fails when card is in DPU mode #830

Open SalDaniele opened 1 year ago

SalDaniele commented 1 year ago

Using mstflint compiled from source code:

# mstflint -v
mstflint, mstflint 4.25.0, built on Aug 23 2023, 19:39:15. Git SHA Hash: N/A

On a Bluefield-2 w/ BMC

Device #1:
----------

Device type:    BlueField2      
Name:           MBF2H512C-AECO_Ax
Description:    BlueField-2 P-Series DPU 25GbE Dual-Port SFP56; integrated BMC; PCIe Gen4 x8; Secure Boot Enabled; Crypto Enabled; 16GB on-board DDR; 1GbE OOB management; FHHL
Device:         ca:00.0         

Configurations:                                      Default         Current         Next Boot
         MEMIC_BAR_SIZE                              0               0               0               
         MEMIC_SIZE_LIMIT                            _256KB(1)       _256KB(1)       _256KB(1)       
         HOST_CHAINING_MODE                          DISABLED(0)     DISABLED(0)     DISABLED(0)     
         HOST_CHAINING_CACHE_DISABLE                 False(0)        False(0)        False(0)        
         HOST_CHAINING_DESCRIPTORS                   Array[0..7]     Array[0..7]     Array[0..7]     
         HOST_CHAINING_TOTAL_BUFFER_SIZE             Array[0..7]     Array[0..7]     Array[0..7]     
         INTERNAL_CPU_MODEL                          EMBEDDED_CPU(1) EMBEDDED_CPU(1) EMBEDDED_CPU(1) 
         INTERNAL_CPU_PAGE_SUPPLIER                  ECPF(0)         ECPF(0)         ECPF(0)         
         INTERNAL_CPU_ESWITCH_MANAGER                ECPF(0)         ECPF(0)         ECPF(0)         
         INTERNAL_CPU_IB_VPORT0                      ECPF(0)         ECPF(0)         ECPF(0)         
         INTERNAL_CPU_OFFLOAD_ENGINE                 ENABLED(0)      ENABLED(0)      ENABLED(0)  
         FLEX_PARSER_PROFILE_ENABLE                  0               0               0               
         PROG_PARSE_GRAPH                            False(0)        False(0)        False(0)        
         FLEX_IPV4_OVER_VXLAN_PORT                   0               0               0               
         ROCE_NEXT_PROTOCOL                          254             254             254             
         ESWITCH_HAIRPIN_DESCRIPTORS                 Array[0..7]     Array[0..7]     Array[0..7]     
         ESWITCH_HAIRPIN_TOT_BUFFER_SIZE             Array[0..7]     Array[0..7]     Array[0..7]     
         PF_BAR2_SIZE                                3               3               3               
         DPU_RESET_NOTIFICATION_ENABLED              ENABLED(1)      ENABLED(1)      ENABLED(1)      
         INTERNAL_CPU_RSHIM                          ENABLED(0)      ENABLED(0)      ENABLED(0)      
         PF_NUM_OF_VF_VALID                          False(0)        False(0)        False(0)        
         NON_PREFETCHABLE_PF_BAR                     False(0)        False(0)        False(0)        
         VF_VPD_ENABLE                               False(0)        False(0)        False(0)        
         PF_NUM_PF_MSIX_VALID                        False(0)        False(0)        False(0)        
         PER_PF_NUM_SF                               False(0)        False(0)        False(0)        
         STRICT_VF_MSIX_NUM                          False(0)        False(0)        False(0)        
         VF_NODNIC_ENABLE                            False(0)        False(0)        False(0)        
         NUM_PF_MSIX_VALID                           True(1)         True(1)         True(1)         
         NUM_OF_VFS                                  8               8               8               
         NUM_OF_PF                                   2               2               2               
         PF_BAR2_ENABLE                              True(1)         True(1)         True(1)         
         HIDE_PORT2_PF                               False(0)        False(0)        False(0)        
         SRIOV_EN                                    True(1)         True(1)         True(1)         
         PF_LOG_BAR_SIZE                             5               5               5               
         VF_LOG_BAR_SIZE                             0               0               0               
         NUM_PF_MSIX                                 63              63              63              
         NUM_VF_MSIX                                 11              11              11              
         INT_LOG_MAX_PAYLOAD_SIZE                    AUTOMATIC(0)    AUTOMATIC(0)    AUTOMATIC(0)    
         PCIE_CREDIT_TOKEN_TIMEOUT                   0               0               0               
         RT_PPS_ENABLED_ON_POWERUP                   False(0)        False(0)        False(0)        
         LAG_RESOURCE_ALLOCATION                     DEVICE_DEFAULT(0) DEVICE_DEFAULT(0) DEVICE_DEFAULT(0)
         PHY_COUNT_LINK_UP_DELAY                     DELAY_NONE(0)   DELAY_NONE(0)   DELAY_NONE(0)   
         ACCURATE_TX_SCHEDULER                       False(0)        False(0)        False(0)        
         PARTIAL_RESET_EN                            False(0)        False(0)        False(0)        
         RESET_WITH_HOST_ON_ERRORS                   False(0)        False(0)        False(0)        
         NVME_EMULATION_ENABLE                       False(0)        False(0)        False(0)        
         NVME_EMULATION_NUM_VF                       0               0               0               
         NVME_EMULATION_NUM_PF                       1               1               1               
         NVME_EMULATION_VENDOR_ID                    5555            5555            5555            
         NVME_EMULATION_DEVICE_ID                    24577           24577           24577           
         NVME_EMULATION_CLASS_CODE                   67586           67586           67586           
         NVME_EMULATION_REVISION_ID                  0               0               0               
         NVME_EMULATION_SUBSYSTEM_VENDOR_ID          0               0               0               
         NVME_EMULATION_SUBSYSTEM_ID                 0               0               0               
         NVME_EMULATION_NUM_MSIX                     0               0               0               
         NVME_EMULATION_MAX_QUEUE_DEPTH              0               0               0               
         PCI_SWITCH_EMULATION_NUM_PORT               0               0               0               
         VIRTIO_EMULATION_HOTPLUG_TRANS              False(0)        False(0)        False(0)        
         PCI_SWITCH_EMULATION_ENABLE                 False(0)        False(0)        False(0)        
         VIRTIO_NET_EMULATION_VF_PCI_LAYOUT          VIRTIO_1_X(0)   VIRTIO_1_X(0)   VIRTIO_1_X(0)   
         VIRTIO_NET_EMULATION_PF_PCI_LAYOUT          VIRTIO_1_X(0)   VIRTIO_1_X(0)   VIRTIO_1_X(0)   
         VIRTIO_NET_EMULATION_ENABLE                 False(0)        False(0)        False(0)        
         VIRTIO_NET_EMULATION_NUM_VF                 0               0               0               
         VIRTIO_NET_EMULATION_NUM_PF                 0               0               0               
         VIRTIO_NET_EMU_SUBSYSTEM_VENDOR_ID          6900            6900            6900            
         VIRTIO_NET_EMULATION_SUBSYSTEM_ID           4161            4161            4161            
         VIRTIO_NET_EMULATION_NUM_MSIX               2               2               2               
         VIRTIO_BLK_EMULATION_VF_PCI_LAYOUT          VIRTIO_1_X(0)   VIRTIO_1_X(0)   VIRTIO_1_X(0)   
         VIRTIO_BLK_EMULATION_PF_PCI_LAYOUT          VIRTIO_1_X(0)   VIRTIO_1_X(0)   VIRTIO_1_X(0)   
         VIRTIO_BLK_EMULATION_ENABLE                 False(0)        False(0)        False(0)        
         VIRTIO_BLK_EMULATION_NUM_VF                 0               0               0               
         VIRTIO_BLK_EMULATION_NUM_PF                 0               0               0               
         VIRTIO_BLK_EMU_SUBSYSTEM_VENDOR_ID          6900            6900            6900            
         VIRTIO_BLK_EMULATION_SUBSYSTEM_ID           4162            4162            4162            
         VIRTIO_BLK_EMULATION_NUM_MSIX               2               2               2               
         PCI_DOWNSTREAM_PORT_OWNER                   Array[0..15]    Array[0..15]    Array[0..15]    
         CQE_COMPRESSION                             BALANCED(0)     BALANCED(0)     BALANCED(0)     
         IP_OVER_VXLAN_EN                            False(0)        False(0)        False(0)        
         MKEY_BY_NAME                                False(0)        False(0)        False(0)        
         PRIO_TAG_REQUIRED_EN                        False(0)        False(0)        False(0)        
         UCTX_EN                                     True(1)         True(1)         True(1)         
         REAL_TIME_CLOCK_ENABLE                      False(0)        False(0)        False(0)        
         RDMA_SELECTIVE_REPEAT_EN                    False(0)        False(0)        False(0)        
         PCI_ATOMIC_MODE                             PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0) PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0) PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0)
         TUNNEL_ECN_COPY_DISABLE                     False(0)        False(0)        False(0)        
         LRO_LOG_TIMEOUT0                            6               6               6               
         LRO_LOG_TIMEOUT1                            7               7               7               
         LRO_LOG_TIMEOUT2                            8               8               8               
         LRO_LOG_TIMEOUT3                            13              13              13              
         LOG_TX_PSN_WINDOW                           7               7               7               
         VF_MIGRATION_MODE                           DEVICE_DEFAULT(0) DEVICE_DEFAULT(0) DEVICE_DEFAULT(0)
         LOG_MAX_OUTSTANDING_WQE                     7               7               7               
         ROCE_ADAPTIVE_ROUTING_EN                    False(0)        False(0)        False(0)        
         TUNNEL_IP_PROTO_ENTROPY_DISABLE             False(0)        False(0)        False(0)        
         USER_PROGRAMMABLE_CC                        False(0)        False(0)        False(0)        
         PCC_INT_NP_RTT_DSCP                         26              26              26              
         PCC_INT_NP_RTT_DSCP_EN                      False(0)        False(0)        False(0)        
         PCC_INT_NP_RTT_DATA_MODE                    RTT_V0(64)      RTT_V0(64)      RTT_V0(64)      
         PCC_INT_EN                                  False(0)        False(0)        False(0)        
         PCC_INT_SYSTEM_RTT                          0               0               0               
         MULTI_PCI_RESOURCE_SHARING                  DEVICE_DEFAULT(0) DEVICE_DEFAULT(0) DEVICE_DEFAULT(0)
         ICM_CACHE_MODE                              DEVICE_DEFAULT(0) DEVICE_DEFAULT(0) DEVICE_DEFAULT(0)
         HAIRPIN_DATA_BUFFER_LOCK                    False(0)        False(0)        False(0)        
         TLS_OPTIMIZE                                False(0)        False(0)        False(0)        
         TX_SCHEDULER_BURST                          0               0               0               
         ZERO_TOUCH_TUNING_ENABLE                    False(0)        False(0)        False(0)        
         ROCE_CC_LEGACY_DCQCN                        False(0)        False(0)        False(0)        
         LOG_MAX_QUEUE                               17              17              17              
*        CRYPTO_POLICY                               UNRESTRICTED(1) DEVICE_DEFAULT(0) UNRESTRICTED(1) 
         LOG_DCR_HASH_TABLE_SIZE                     11              11              11              
         MAX_PACKET_LIFETIME                         0               0               0               
         DCR_LIFO_SIZE                               16384           16384           16384           
         ROCE_CC_PRIO_MASK_P1                        255             255             255             
         ROCE_CC_PRIO_MASK_P2                        255             255             255             
         CLAMP_TGT_RATE_AFTER_TIME_INC_P1            True(1)         True(1)         True(1)         
         CLAMP_TGT_RATE_P1                           False(0)        False(0)        False(0)        
         RPG_TIME_RESET_P1                           300             300             300             
         RPG_BYTE_RESET_P1                           32767           32767           32767           
         RPG_THRESHOLD_P1                            1               1               1               
         RPG_MAX_RATE_P1                             0               0               0               
         RPG_AI_RATE_P1                              5               5               5               
         RPG_HAI_RATE_P1                             50              50              50              
         RPG_GD_P1                                   11              11              11              
         RPG_MIN_DEC_FAC_P1                          50              50              50              
         RPG_MIN_RATE_P1                             1               1               1               
         RATE_TO_SET_ON_FIRST_CNP_P1                 0               0               0               
         DCE_TCP_G_P1                                1019            1019            1019            
         DCE_TCP_RTT_P1                              1               1               1               
         RATE_REDUCE_MONITOR_PERIOD_P1               4               4               4               
         INITIAL_ALPHA_VALUE_P1                      1023            1023            1023            
         MIN_TIME_BETWEEN_CNPS_P1                    4               4               4               
         CNP_802P_PRIO_P1                            6               6               6               
         CNP_DSCP_P1                                 48              48              48              
         CLAMP_TGT_RATE_AFTER_TIME_INC_P2            True(1)         True(1)         True(1)         
         CLAMP_TGT_RATE_P2                           False(0)        False(0)        False(0)        
         RPG_TIME_RESET_P2                           300             300             300             
         RPG_BYTE_RESET_P2                           32767           32767           32767           
         RPG_THRESHOLD_P2                            1               1               1               
         RPG_MAX_RATE_P2                             0               0               0               
         RPG_AI_RATE_P2                              5               5               5               
         RPG_HAI_RATE_P2                             50              50              50              
         RPG_GD_P2                                   11              11              11              
         RPG_MIN_DEC_FAC_P2                          50              50              50              
         RPG_MIN_RATE_P2                             1               1               1               
         RATE_TO_SET_ON_FIRST_CNP_P2                 0               0               0               
         DCE_TCP_G_P2                                1019            1019            1019            
         DCE_TCP_RTT_P2                              1               1               1               
         RATE_REDUCE_MONITOR_PERIOD_P2               4               4               4               
         INITIAL_ALPHA_VALUE_P2                      1023            1023            1023            
         MIN_TIME_BETWEEN_CNPS_P2                    4               4               4               
         CNP_802P_PRIO_P2                            6               6               6               
         CNP_DSCP_P2                                 48              48              48              
         LLDP_NB_DCBX_P1                             False(0)        False(0)        False(0)        
         LLDP_NB_RX_MODE_P1                          OFF(0)          OFF(0)          OFF(0)          
         LLDP_NB_TX_MODE_P1                          OFF(0)          OFF(0)          OFF(0)          
         LLDP_NB_DCBX_P2                             False(0)        False(0)        False(0)        
         LLDP_NB_RX_MODE_P2                          OFF(0)          OFF(0)          OFF(0)          
         LLDP_NB_TX_MODE_P2                          OFF(0)          OFF(0)          OFF(0)          
         DCBX_IEEE_P1                                True(1)         True(1)         True(1)         
         DCBX_CEE_P1                                 True(1)         True(1)         True(1)         
         DCBX_WILLING_P1                             True(1)         True(1)         True(1)         
         DCBX_IEEE_P2                                True(1)         True(1)         True(1)         
         DCBX_CEE_P2                                 True(1)         True(1)         True(1)         
         DCBX_WILLING_P2                             True(1)         True(1)         True(1)         
         KEEP_ETH_LINK_UP_P1                         True(1)         True(1)         True(1)         
         KEEP_IB_LINK_UP_P1                          False(0)        False(0)        False(0)        
         KEEP_LINK_UP_ON_BOOT_P1                     False(0)        False(0)        False(0)        
         KEEP_LINK_UP_ON_STANDBY_P1                  False(0)        False(0)        False(0)        
         DO_NOT_CLEAR_PORT_STATS_P1                  False(0)        False(0)        False(0)        
         AUTO_POWER_SAVE_LINK_DOWN_P1                False(0)        False(0)        False(0)        
         KEEP_ETH_LINK_UP_P2                         True(1)         True(1)         True(1)         
         KEEP_IB_LINK_UP_P2                          False(0)        False(0)        False(0)        
         KEEP_LINK_UP_ON_BOOT_P2                     False(0)        False(0)        False(0)        
         KEEP_LINK_UP_ON_STANDBY_P2                  False(0)        False(0)        False(0)        
         DO_NOT_CLEAR_PORT_STATS_P2                  False(0)        False(0)        False(0)        
         AUTO_POWER_SAVE_LINK_DOWN_P2                False(0)        False(0)        False(0)        
         NUM_OF_VL_P1                                _4_VLs(3)       _4_VLs(3)       _4_VLs(3)       
         NUM_OF_TC_P1                                _8_TCs(0)       _8_TCs(0)       _8_TCs(0)       
         NUM_OF_PFC_P1                               8               8               8               
         VL15_BUFFER_SIZE_P1                         0               0               0               
         QOS_TRUST_STATE_P1                          TRUST_PCP(1)    TRUST_PCP(1)    TRUST_PCP(1)    
         NUM_OF_VL_P2                                _4_VLs(3)       _4_VLs(3)       _4_VLs(3)       
         NUM_OF_TC_P2                                _8_TCs(0)       _8_TCs(0)       _8_TCs(0)       
         NUM_OF_PFC_P2                               8               8               8               
         VL15_BUFFER_SIZE_P2                         0               0               0               
         QOS_TRUST_STATE_P2                          TRUST_PCP(1)    TRUST_PCP(1)    TRUST_PCP(1)    
         DUP_MAC_ACTION_P1                           LAST_CFG(0)     LAST_CFG(0)     LAST_CFG(0)     
         MPFS_MC_LOOPBACK_DISABLE_P1                 False(0)        False(0)        False(0)        
         MPFS_UC_LOOPBACK_DISABLE_P1                 False(0)        False(0)        False(0)        
         UNKNOWN_UPLINK_MAC_FLOOD_P1                 False(0)        False(0)        False(0)        
         SRIOV_IB_ROUTING_MODE_P1                    LID(1)          LID(1)          LID(1)          
         IB_ROUTING_MODE_P1                          LID(1)          LID(1)          LID(1)          
         DUP_MAC_ACTION_P2                           LAST_CFG(0)     LAST_CFG(0)     LAST_CFG(0)     
         MPFS_MC_LOOPBACK_DISABLE_P2                 False(0)        False(0)        False(0)        
         MPFS_UC_LOOPBACK_DISABLE_P2                 False(0)        False(0)        False(0)        
         UNKNOWN_UPLINK_MAC_FLOOD_P2                 False(0)        False(0)        False(0)        
         SRIOV_IB_ROUTING_MODE_P2                    LID(1)          LID(1)          LID(1)          
         IB_ROUTING_MODE_P2                          LID(1)          LID(1)          LID(1)          
         PHY_AUTO_NEG_P1                             DEVICE_DEFAULT(0) DEVICE_DEFAULT(0) DEVICE_DEFAULT(0)
         PHY_RATE_MASK_OVERRIDE_P1                   False(0)        False(0)        False(0)        
         PHY_FEC_OVERRIDE_P1                         DEVICE_DEFAULT(0) DEVICE_DEFAULT(0) DEVICE_DEFAULT(0)
         PHY_AUTO_NEG_P2                             DEVICE_DEFAULT(0) DEVICE_DEFAULT(0) DEVICE_DEFAULT(0)
         PHY_RATE_MASK_OVERRIDE_P2                   False(0)        False(0)        False(0)        
         PHY_FEC_OVERRIDE_P2                         DEVICE_DEFAULT(0) DEVICE_DEFAULT(0) DEVICE_DEFAULT(0)
*        PF_TOTAL_SF                                 0               32              0               
*        PF_SF_BAR_SIZE                              0               8               0               
         PF_NUM_PF_MSIX                              63              63              63              
*        ROCE_CONTROL                                ROCE_ENABLE(2)  DEVICE_DEFAULT(0) ROCE_ENABLE(2)  
         PCI_WR_ORDERING                             per_mkey(0)     per_mkey(0)     per_mkey(0)     
         MULTI_PORT_VHCA_EN                          False(0)        False(0)        False(0)        
         PORT_OWNER                                  True(1)         True(1)         True(1)         
         ALLOW_RD_COUNTERS                           True(1)         True(1)         True(1)         
         RENEG_ON_CHANGE                             True(1)         True(1)         True(1)         
         TRACER_ENABLE                               True(1)         True(1)         True(1)         
         IP_VER                                      IPv4(0)         IPv4(0)         IPv4(0)         
         BOOT_UNDI_NETWORK_WAIT                      0               0               0               
         UEFI_HII_EN                                 True(1)         True(1)         True(1)         
         BOOT_DBG_LOG                                False(0)        False(0)        False(0)        
         UEFI_LOGS                                   DISABLED(0)     DISABLED(0)     DISABLED(0)     
         BOOT_VLAN                                   1               1               1               
         LEGACY_BOOT_PROTOCOL                        PXE(1)          PXE(1)          PXE(1)          
         BOOT_RETRY_CNT                              NONE(0)         NONE(0)         NONE(0)         
         BOOT_INTERRUPT_DIS                          False(0)        False(0)        False(0)        
         BOOT_LACP_DIS                               True(1)         True(1)         True(1)         
         BOOT_VLAN_EN                                False(0)        False(0)        False(0)        
         BOOT_PKEY                                   0               0               0               
         P2P_ORDERING_MODE                           DEVICE_DEFAULT(0) DEVICE_DEFAULT(0) DEVICE_DEFAULT(0)
         EXP_ROM_VIRTIO_NET_PXE_ENABLE               True(1)         True(1)         True(1)         
         EXP_ROM_VIRTIO_NET_UEFI_ARM_ENABLE          True(1)         True(1)         True(1)         
         EXP_ROM_VIRTIO_NET_UEFI_x86_ENABLE          True(1)         True(1)         True(1)         
         EXP_ROM_VIRTIO_BLK_UEFI_ARM_ENABLE          True(1)         True(1)         True(1)         
         EXP_ROM_VIRTIO_BLK_UEFI_x86_ENABLE          True(1)         True(1)         True(1)         
         EXP_ROM_NVME_UEFI_x86_ENABLE                True(1)         True(1)         True(1)         
         ATS_ENABLED                                 False(0)        False(0)        False(0)        
         DYNAMIC_VF_MSIX_TABLE                       False(0)        False(0)        False(0)        
         EXP_ROM_UEFI_ARM_ENABLE                     True(1)         True(1)         True(1)         
         EXP_ROM_UEFI_x86_ENABLE                     True(1)         True(1)         True(1)         
         EXP_ROM_PXE_ENABLE                          True(1)         True(1)         True(1)         
         ADVANCED_PCI_SETTINGS                       False(0)        False(0)        False(0)        
         SAFE_MODE_THRESHOLD                         10              10              10              
         SAFE_MODE_ENABLE                            True(1)         True(1)         True(1)      

I am trying to update the fw to the latest version. After running mstflint -d -i <.bin> burn, this is the state of the bluefield

mstflint -d ca:00.0 q                                                                                                        
Image type:            FS4                                                                                                                                      
FW Version:            24.38.1002                                                                                                                               
FW Version(Running):   24.33.1048
FW Release Date:       3.8.2023
Product Version:       24.33.1048
Rom Info:              type=UEFI Virtio net version=21.2.10 cpu=AMD64
                       type=UEFI Virtio blk version=22.2.10 cpu=AMD64
                       type=UEFI version=14.26.17 cpu=AMD64,AARCH64
                       type=PXE version=3.6.502 cpu=AMD64
Description:           UID                GuidsNumber
Base GUID:             e8ebd30300cee2e6        16
Base MAC:              e8ebd3cee2e6            16
Image VSD:             N/A
Device VSD:            N/A
PSID:                  MT_0000000724

mstfwreset fails with the following error:

# mstfwreset -d ca:00.0 r

-E- Synchronization by driver is not supported in the current state of this device.

If I disable sync and run this again, it hangs on waiting for other hosts, and times out

# mstfwreset -d ca:00.0 r --sync 0

Minimal reset level for device, ca:00.0:

3: Driver restart and PCI reset
Please be aware that resetting the Bluefield may take several minutes. Exiting the process in the middle of the waiting period will not halt the reset
Continue with reset?[y/N] y
-I- Sending Reset Command To Fw             -Done
Waiting for mstfwreset to run on all other hosts, press 'ctrl+c' to abort
Failed
-E- fsm sync timed out.

I can skip the fsm sync but this results in the fw reset failing without a particular error message

# mstfwreset -d ca:00.0 r --sync 0 --skip_fsm_sync                                                                             

Minimal reset level for device, ca:00.0:                                                                                                                        

3: Driver restart and PCI reset                                                                                                                                 
Please be aware that resetting the Bluefield may take several minutes. Exiting the process in the middle of the waiting period will not halt the reset          
Continue with reset?[y/N] y                                                                                                                                     
-I- Sending Reset Command To Fw             -Done                                                                                                               
-I- Stopping Driver                         -Done                                                                                                               
-I- Resetting PCI                           -Done                                                                                                               
-I- Starting Driver                         -Done                                                                                                               
-E- Firmware reset failed, retry operation or reboot machine.

Note that I tried rebooting the host machine at this point, however the fw update has not been applied after reboot.

The only way I have found to apply the updated firmware is to switch the device to "NIC mode", after which fwreset is able to successfully apply the pending configurations, as well as switch to the updated fw version.

# mstconfig -y -d ca:00.0 s                     INTERNAL_CPU_MODEL=EMBEDDED_CPU                     INTERNAL_CPU_PAGE_SUPPLIE[74/1830]
T_PF                     INTERNAL_CPU_ESWITCH_MANAGER=EXT_HOST_PF                     INTERNAL_CPU_IB_VPORT0=EXT_HOST_PF                     INTERNAL_CPU_OFFLOA
D_ENGINE=DISABLED                                                                                                                                               

Device #1:                                                                                                                                                      
----------

Device type:    BlueField2
Name:           MBF2H512C-AECO_Ax
Description:    BlueField-2 P-Series DPU 25GbE Dual-Port SFP56; integrated BMC; PCIe Gen4 x8; Secure Boot Enabled; Crypto Enabled; 16GB on-board DDR; 1GbE OOB $
anagement; FHHL
Device:         ca:00.0

Configurations:                                      Next Boot       New
         INTERNAL_CPU_MODEL                          EMBEDDED_CPU(1) EMBEDDED_CPU(1)
         INTERNAL_CPU_PAGE_SUPPLIER                  ECPF(0)         EXT_HOST_PF(1)
         INTERNAL_CPU_ESWITCH_MANAGER                ECPF(0)         EXT_HOST_PF(1)
         INTERNAL_CPU_IB_VPORT0                      ECPF(0)         EXT_HOST_PF(1)
         INTERNAL_CPU_OFFLOAD_ENGINE                 ENABLED(0)      DISABLED(1)

 Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

# mstfwreset -d ca:00.0 r

Minimal reset level for device, ca:00.0:

3: Driver restart and PCI reset
Continue with reset?[y/N] y
-I- Sending Reset Command To Fw             -Done
-I- Stopping Driver                         -Done
-I- Resetting PCI                           -Done
-I- Starting Driver                         -Done
-I- FW was loaded successfully.

# mstflint -d ca:00.0 q
Image type:            FS4
FW Version:            24.38.1002
FW Release Date:       3.8.2023
Product Version:       24.38.1002
Rom Info:              type=UEFI Virtio net version=21.4.10 cpu=AMD64,AARCH64
                       type=UEFI Virtio blk version=22.4.10 cpu=AMD64,AARCH64
                       type=UEFI version=14.31.20 cpu=AMD64,AARCH64
                       type=PXE version=3.7.201 cpu=AMD64
Description:           UID                GuidsNumber
Base GUID:             e8ebd30300cee2e6        16
Base MAC:              e8ebd3cee2e6            16
Image VSD:             N/A
Device VSD:            N/A
PSID:                  MT_0000000724
Security Attributes:   secure-fw
thillux commented 1 year ago

We also experienced such behavior with the current master_devel branch of the upcoming 4.26 release. Do you plan to fix this with 4.26?

We use mstflint together with a mainline kernel 6.5 at the moment.

thillux commented 1 year ago

Update: with kernel 6.6 and a more recent rdma-core version we were able to trigger a reset succesfully from BF2. But ran into the 60s timeout. dmesg showed, that the reset worked.

ogalbxela commented 6 months ago

Had a conversation with the owner from our side. Direction was: please use mstflint-4.28 (just released) and the latest available driver to flash the latest published firmware. Please also query the device with "mstfwreset" (mstfwreset -d DEVICE q). It will list "sync"-capabilities for you.

Something like:

mstfwreset -d 81:00.0 q
 
<some output omitted>

Reset-sync (relevant only for reset-level 3):
0: Tool is the owner                                             -Not supported 
1: Driver is the owner                                           -Supported     (default)

For "sync 0" - tool is the owner of reset flow and reset command should be issued from both host and arm side For "sync 1" - driver is the owner 

abhiramnarayana commented 1 week ago

Hi @ogalbxela ,

Even we faced the same error of "Synchronization by driver is not supported in the current state of this device." during mstfwreset. Device we are using is as below ~]$ sudo lshw -class network -businfo | grep BlueField-2 pci@0000:17:00.0 ens2f0np0 network MT42822 BlueField-2 integrated ConnectX-6 Dx network controller pci@0000:17:00.1 ens2f1np1 network MT42822 BlueField-2 integrated ConnectX-6 Dx network controller

As suggested in the thread above we updated the Firmware to latest (24.42.1000) and also took v4.29 mstflint latest. But still we are facing issue during mstfwreset. (Even with v4.28 it gives the same error as in v4.29) ~]$ ethtool -i ens2f0np0 driver: mlx5_core version: 5.14.0-427.42.1.el9_4.x86_64 firmware-version: 24.42.1000 (MT_0000000765) expansion-rom-version: bus-info: 0000:17:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes [cloud-admin@compute-1 ~]$

~]$ sudo mstflint --version mstflint, mstflint 4.29.0, Git SHA Hash: 37af981

~]$ sudo mstfwreset -d 17:00.0 q

Reset-levels: 0: Driver, PCI link, network link will remain up ("live-Patch") -Not Supported 1: Only ARM side will not remain up ("Immediate reset"). -Not Supported 3: Driver restart and PCI reset -Supported (default) 4: Warm Reboot -Supported

Reset-types (relevant only for reset-levels 1,3,4): 0: Full chip reset -Supported (default) 1: Phy-less reset (keep network port active during reset) -Not Supported 2: NIC only reset (for SoC devices) -Not Supported 3: ARM only reset -Not Supported 4: ARM OS shut down -Not Supported

Reset-sync (relevant only for reset-level 3): 0: Tool is the owner -Not supported 1: Driver is the owner -Supported (default)

Reset-reason: Warm reset
Timestamp (number of clock cycles) since last cold reset: 1308350112

Note that the Reset-sync (sync 1) shows supported. But still it is not working. ~]$ sudo mstfwreset --device 0000:17:00.0 --level 3 -y r

The reset level for device, 0000:17:00.0 is:

3: Driver restart and PCI reset Please be aware that resetting the Bluefield may take several minutes. Exiting the process in the middle of the waiting period will not halt the reset. The ARM side will be restarted, and it will be unavailable for a while. Continue with reset?[y/N] y -I- Sending Reset Command To Fw --E- The BF reset flow encountered a failure due to a reset state error of negotiation dis-acknowledgment. [cloud-admin@compute-1 ~]$