Open-CAS / open-cas-linux

Open CAS Linux
https://open-cas.com
BSD 3-Clause "New" or "Revised" License
216 stars 82 forks source link

Smoke tests failing on "incremental_load/01" #979

Closed rafalste closed 2 years ago

rafalste commented 2 years ago

Logs

Output from Jenkins:

--- Starting test mode bvt ---

  1. basic/01                      Verify Open CAS CLI using --version                                                  [OK]    0[s] (00:00:00)
  2. basic/02                      Adding valid & invalid core devices to caches                                        [OK]    5[s] (00:00:05)
  3. basic/04                      Test files on core devices with different filesystems after stopping cache           [OK]   19[s] (00:00:19)
  4. basic/07                      Validate spelling of help in CLI                                                     [OK]    0[s] (00:00:00)
  5. basic/09                      Validate spelling in manpage                                                         [OK]    0[s] (00:00:00)
  6. basic/10                      Verify cache start in different modes                                                [OK]  102[s] (00:01:42)
  7. basic/12                      WB data integrity ext3/ext4/xfs after clean shutdown, no flush                       [OK]   22[s] (00:00:22)
  8. basic/13                      WO data integrity ext3/ext4/xfs after clean shutdown, no flush                       [OK]   22[s] (00:00:22)
  9. incremental_load/01           Try add 3 cores to pool, try load cache and check if cache is running           FATAL: command execution failed
java.io.EOFException
    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
    at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
    at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
    at hudson.remoting.Command.readFrom(Command.java:142)
    at hudson.remoting.Command.readFrom(Command.java:128)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
Caused: java.io.IOException: Unexpected termination of the channel
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)
Caused: java.io.IOException: Backing channel 'OCL-NEW-BVT-SM27' is disconnected.
    at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
    at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:286)
    at com.sun.proxy.$Proxy88.isAlive(Unknown Source)
    at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1211)
    at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1203)
    at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:195)
    at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:145)
    at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
    at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
    at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:803)
    at hudson.model.Build$BuildExecution.build(Build.java:197)
    at hudson.model.Build$BuildExecution.doRun(Build.java:163)
    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:513)
    at hudson.model.Run.execute(Run.java:1906)
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
    at hudson.model.ResourceController.execute(ResourceController.java:97)
    at hudson.model.Executor.run(Executor.java:429)
FATAL: Unable to delete script file /tmp/jenkins2566217139270257114.sh
java.io.EOFException
    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
    at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
    at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
    at hudson.remoting.Command.readFrom(Command.java:142)
    at hudson.remoting.Command.readFrom(Command.java:128)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)
Caused: java.io.IOException: Unexpected termination of the channel
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:75)
Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@50159153:OCL-NEW-BVT-SM27": Remote call on OCL-NEW-BVT-SM27 failed. The channel is closing down or has closed down
    at hudson.remoting.Channel.call(Channel.java:994)
    at hudson.FilePath.act(FilePath.java:1159)
    at hudson.FilePath.act(FilePath.java:1148)
    at hudson.FilePath.delete(FilePath.java:1671)
    at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:163)
    at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
    at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
    at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:803)
    at hudson.model.Build$BuildExecution.build(Build.java:197)
    at hudson.model.Build$BuildExecution.doRun(Build.java:163)
    at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:513)
    at hudson.model.Run.execute(Run.java:1906)
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
    at hudson.model.ResourceController.execute(ResourceController.java:97)
    at hudson.model.Executor.run(Executor.java:429)
Build step 'Execute shell' marked build as failure

Manual test run # ./incremental_load/01 -c /dev/disk/by-id/nvme-INTEL_SSDPE2MD400G4_PHFT6390018D400GGN -d /dev/disk/by-id/ata-INTEL_SSDSC2BF240A5_CVTS418300KQ240JGN -i:

--- Open CAS API library loaded ---
--- Open CAS options library loaded ---
--- Using home config file ---
--- Your configuration ---
Cache device: /dev/disk/by-id/nvme-INTEL_SSDPE2MD400G4_PHFT6390018D400GGN
Core device: /dev/disk/by-id/ata-INTEL_SSDSC2BF240A5_CVTS418300KQ240JGN
Mount point: /mnt/cas
Temporary directory : /var/tmp/.cas
./incremental_load/..//cas_config: line 103: /dev/sdc: Permission denied
Many tests use program called vdbench and it looks like it's not installed here.[WARNING]
Many tests use program called fio and it looks like it's not installed here.[WARNING]
--- Open CAS configuration loaded correctly ---
*** Starting test ./incremental_load/01 - Try add 3 cores to pool, try load cache and check if cache is running ***
2021-10-27_02:15:37  Running dd if=/dev/zero of=/dev/disk/by-id/nvme-INTEL_SSDPE2MD400G4_PHFT6390018D400GGN bs=4k count=10 [OK]
2021-10-27_02:15:37  Running dd if=/dev/zero of=/dev/disk/by-id/ata-INTEL_SSDSC2BF240A5_CVTS418300KQ240JGN bs=4k count=10 [OK]
2021-10-27_02:15:37  Running parted -s /dev/disk/by-id/nvme-INTEL_SSDPE2MD400G4_PHFT6390018D400GGN mktable gpt [OK]
parted -s -a optimal /dev/disk/by-id/nvme-INTEL_SSDPE2MD400G4_PHFT6390018D400GGN mkpart primary 2048s 4098048s [OK]
2021-10-27_02:15:37  Running udevadm settle [OK]
2021-10-27_02:15:37  Running dd if=/dev/zero of=/dev/disk/by-id/nvme-INTEL_SSDPE2MD400G4_PHFT6390018D400GGN-part1 bs=1M count=1 oflag=direct [OK]
2021-10-27_02:15:37  Running udevadm settle [OK]
2021-10-27_02:15:37  Running parted -s /dev/disk/by-id/ata-INTEL_SSDSC2BF240A5_CVTS418300KQ240JGN mktable gpt [OK]
parted -s -a optimal /dev/disk/by-id/ata-INTEL_SSDSC2BF240A5_CVTS418300KQ240JGN mkpart primary 2048s 8194048s [OK]
parted -s -a optimal /dev/disk/by-id/ata-INTEL_SSDSC2BF240A5_CVTS418300KQ240JGN mkpart primary 8196096s 16388096s [OK]
parted -s -a optimal /dev/disk/by-id/ata-INTEL_SSDSC2BF240A5_CVTS418300KQ240JGN mkpart primary 16390144s 24582144s [OK]
2021-10-27_02:15:37  Running udevadm settle [OK]
2021-10-27_02:15:38  Running dd if=/dev/zero of=/dev/disk/by-id/ata-INTEL_SSDSC2BF240A5_CVTS418300KQ240JGN-part1 bs=1M count=1 oflag=direct [OK]
2021-10-27_02:15:38  Running dd if=/dev/zero of=/dev/disk/by-id/ata-INTEL_SSDSC2BF240A5_CVTS418300KQ240JGN-part2 bs=1M count=1 oflag=direct [OK]
2021-10-27_02:15:38  Running dd if=/dev/zero of=/dev/disk/by-id/ata-INTEL_SSDSC2BF240A5_CVTS418300KQ240JGN-part3 bs=1M count=1 oflag=direct [OK]
2021-10-27_02:15:38  Running udevadm settle [OK]
2021-10-27_02:15:38  Running casadm --start-cache --cache-device /dev/disk/by-id/nvme-INTEL_SSDPE2MD400G4_PHFT6390018D400GGN-part1 --cache-id 1 [OK]
2021-10-27_02:15:39  Running casadm --add-core --cache-id 1 --core-device /dev/disk/by-id/ata-INTEL_SSDSC2BF240A5_CVTS418300KQ240JGN-part1 [OK]
2021-10-27_02:15:39  Running casadm --add-core --cache-id 1 --core-device /dev/disk/by-id/ata-INTEL_SSDSC2BF240A5_CVTS418300KQ240JGN-part2 [OK]
2021-10-27_02:15:39  Running casadm --add-core --cache-id 1 --core-device /dev/disk/by-id/ata-INTEL_SSDSC2BF240A5_CVTS418300KQ240JGN-part3 [OK]
2021-10-27_02:15:39  Running casadm --list-caches -o csv | grep /dev/nvme0n1p1 | cut -d',' -f4 | grep Running &> /dev/null [OK]
2021-10-27_02:15:39  Running casadm --list-caches -o csv | grep /dev/cas1-1 | cut -d',' -f4 | grep Active &> /dev/null [OK]
2021-10-27_02:15:39  Running casadm --list-caches -o csv | grep /dev/cas1-2 | cut -d',' -f4 | grep Active &> /dev/null [OK]
2021-10-27_02:15:39  Running casadm --list-caches -o csv | grep /dev/cas1-3 | cut -d',' -f4 | grep Active &> /dev/null [OK]
2021-10-27_02:15:39  Running casadm --stop-cache --cache-id 1 [OK]
2021-10-27_02:15:40  Running casadm --script --add-core --try-add --cache-id 1 --core-id 1 --core-device /dev/disk/by-id/ata-INTEL_SSDSC2BF240A5_CVTS418300KQ240JGN-part1 [OK]
2021-10-27_02:15:40  Running casadm --list-caches -o csv | grep /dev/sdc1 | cut -d',' -f4 | grep Detached &> /dev/null [OK]
2021-10-27_02:15:40  Running casadm --script --add-core --try-add --cache-id 1 --core-id 2 --core-device /dev/disk/by-id/ata-INTEL_SSDSC2BF240A5_CVTS418300KQ240JGN-part2

Sometimes it is stuck here, other times two lines before, when adding first core partition. Sometimes it even passes all the way to the end.

Output of # abrt-cli list --since 1635293033 after rebooting:

id 9241d0785d74ee8a819f00815d5c8ff4e5ff71fa
reason:         kernel BUG at /root/jenkins/workspace/OpenCAS/Nightly/modules/cas_cache/src/ocf/utils/utils_async_lock.c:155!
time:           śro, 27 paź 2021, 15:49:26
uid:            0 (root)
count:          1
Directory:      /var/spool/abrt/vmcore-127.0.0.1-2020-02-01-01:20:37
Reported:       cannot be reported

id 2a3611b08ada42c7f59c3b92405bf9c24f2dffea
reason:         kernel BUG at /root/jenkins/workspace/OpenCAS/BVT/PR/modules/cas_cache/src/ocf/utils/utils_refcnt.c:19!
time:           śro, 27 paź 2021, 12:19:59
uid:            0 (root)
count:          1
Directory:      /var/spool/abrt/vmcore-127.0.0.1-2021-09-23-04:42:36
Reported:       cannot be reported

id c589795979c87cf0ecdb5efad316a1f5c742b596
reason:         mce: [Hardware Error]: Machine check events logged
time:           śro, 27 paź 2021, 09:39:16
cmdline:        BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.el7.x86_64 root=LABEL=CENTOS76 ro vconsole.font=latarcyrheb-sun16 vconsole.keymap=us nomodeset net.ifnames=0 biosdevname=0 rhgb quiet console=tty0 console=ttyS0,115200n8 ignore_loglevel printk.time=1 selinux=0 crashkernel=auto
uid:            0
Directory:      /var/spool/abrt/oops-2021-10-27-09:39:16-10862-0
Reported:       cannot be reported

id 39c845c21be9d5c86157b85cd05060bb22bbcfcf
time:           śro, 27 paź 2021, 02:05:28
uid:            0 (root)
count:          1
Directory:      /var/spool/abrt/vmcore-127.0.0.1-2021-10-27-04:24:06

Your Environment

Deixx commented 2 years ago

Bug in PR