Open sjtuross opened 1 year ago
can you run the collector in debug mode? (with --debug
on the CLI?) It seems to be failing to JSON encode your config file when logging.
Everything in your collector config file looks pretty basic, except your device name.
/dev/disks/t10.ATA_____ST8000AS00022D1NA17Z_________________________________Z840R2T2
but I dont see any special characters there, so it should be fine too. Just incase, could you try quoting it in the config file?
Thank you for your reply. I tried quoting the device identifier, but it still crashed. This time I also got another error (occasionally) at the bottom.
I searched this error epollwait on fd 4 failed with 38
and found this golang issue https://github.com/golang/go/issues/24980. Do you think it's related?
Also https://groups.google.com/g/golang-nuts/c/R1vvk2pZW7w?pli=1 about the other error.
version: 1
host:
id: "ESXi2"
devices:
- device: '/dev/disks/t10.ATA_____ST8000AS00022D1NA17Z_________________________________Z840R2T2'
type: 'sat'
api:
endpoint: 'https://scrutiny.rossconsulting.cn'
commands:
metrics_smartctl_bin: './smartctl'
[root@esxi2:/vmfs/volumes/5f79c6f5-a7338bdc-85f3-6cb3114d162c/TEMP/smartmontools] /vmfs/volumes/MX1T/TEMP/smartmontools/scrutiny-collector-metrics-linux-amd64 run --debug --config ./collector.yaml
2022/11/01 02:02:37 No configuration file found at /opt/scrutiny/config/collector.yaml. Using Defaults.
___ ___ ____ __ __ ____ ____ _ _ _ _
/ __) / __)( _ \( )( )(_ _)(_ _)( \( )( \/ )
\__ \( (__ ) / )(__)( )( _)(_ ) ( \ /
(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
AnalogJ/scrutiny/metrics linux.amd64-0.5.0
2022/11/01 02:02:37 Loading configuration file: /vmfs/volumes/MX1T/TEMP/smartmontools/collector.yaml
runtime: epollwait on fd 4 failed with 38
fatal error: runtime: netpoll failed
runtime stack:
runtime.throw({0x8a7b24?, 0xc000061928?})
/opt/hostedtoolcache/go/1.18.3/x64/src/runtime/panic.go:992 +0x71
runtime.netpoll(0x50ce312969043?)
/opt/hostedtoolcache/go/1.18.3/x64/src/runtime/netpoll_epoll.go:130 +0x34e
runtime.sysmon()
/opt/hostedtoolcache/go/1.18.3/x64/src/runtime/proc.go:5131 +0x2d5
runtime.mstart1()
/opt/hostedtoolcache/go/1.18.3/x64/src/runtime/proc.go:1418 +0x93
runtime.mstart0()
/opt/hostedtoolcache/go/1.18.3/x64/src/runtime/proc.go:1376 +0x79
runtime.mstart()
/opt/hostedtoolcache/go/1.18.3/x64/src/runtime/asm_amd64.s:367 +0x5
goroutine 1 [runnable]:
gopkg.in/yaml%2ev2.yaml_parser_scan_plain_scalar(0xc00017e300, 0xc000195240)
/home/runner/work/scrutiny/scrutiny/vendor/gopkg.in/yaml.v2/scannerc.go:2570 +0x140c
gopkg.in/yaml%2ev2.yaml_parser_fetch_plain_scalar(0xc00017e300)
/home/runner/work/scrutiny/scrutiny/vendor/gopkg.in/yaml.v2/scannerc.go:1435 +0x7d
gopkg.in/yaml%2ev2.yaml_parser_fetch_next_token(0xc00017e300)
/home/runner/work/scrutiny/scrutiny/vendor/gopkg.in/yaml.v2/scannerc.go:813 +0x65e
gopkg.in/yaml%2ev2.yaml_parser_fetch_more_tokens(0xc00017e300)
/home/runner/work/scrutiny/scrutiny/vendor/gopkg.in/yaml.v2/scannerc.go:642 +0x19b
gopkg.in/yaml%2ev2.peek_token(...)
/home/runner/work/scrutiny/scrutiny/vendor/gopkg.in/yaml.v2/parserc.go:47
gopkg.in/yaml%2ev2.yaml_parser_parse_document_start(0xc00017e300, 0xc00017e510, 0x1)
/home/runner/work/scrutiny/scrutiny/vendor/gopkg.in/yaml.v2/parserc.go:200 +0x55
gopkg.in/yaml%2ev2.yaml_parser_state_machine(0x1dfdae7108?, 0x1000000000030?)
/home/runner/work/scrutiny/scrutiny/vendor/gopkg.in/yaml.v2/parserc.go:101 +0x54
gopkg.in/yaml%2ev2.yaml_parser_parse(0x1dfdae7108?, 0x600?)
/home/runner/work/scrutiny/scrutiny/vendor/gopkg.in/yaml.v2/parserc.go:72 +0x8c
gopkg.in/yaml%2ev2.(*parser).peek(0xc00017e300)
/home/runner/work/scrutiny/scrutiny/vendor/gopkg.in/yaml.v2/decode.go:105 +0x30
gopkg.in/yaml%2ev2.(*parser).parse(0xc00017e300)
/home/runner/work/scrutiny/scrutiny/vendor/gopkg.in/yaml.v2/decode.go:143 +0x45
gopkg.in/yaml%2ev2.unmarshal({0xc0001c2000, 0x100, 0x600}, {0x807b00?, 0xc00000e120}, 0x0)
/home/runner/work/scrutiny/scrutiny/vendor/gopkg.in/yaml.v2/yaml.go:142 +0x305
gopkg.in/yaml%2ev2.Unmarshal(...)
/home/runner/work/scrutiny/scrutiny/vendor/gopkg.in/yaml.v2/yaml.go:81
github.com/spf13/viper.(*Viper).unmarshalReader(0xc000020b40, {0x9517c0, 0xc00000e118}, 0xc0001bc0f0)
/home/runner/work/scrutiny/scrutiny/vendor/github.com/spf13/viper/viper.go:1490 +0x57c
github.com/spf13/viper.(*Viper).MergeConfig(0xc00002a6c0?, {0x9517c0, 0xc00000e118})
/home/runner/work/scrutiny/scrutiny/vendor/github.com/spf13/viper/viper.go:1389 +0x45
github.com/analogj/scrutiny/collector/pkg/config.(*configuration).ReadConfig(0xc000070840, {0x35e4c488e92?, 0x6?})
/home/runner/work/scrutiny/scrutiny/collector/pkg/config/config.go:88 +0x125
main.main.func2(0xc0001a83c0?)
/home/runner/work/scrutiny/scrutiny/collector/cmd/collector-metrics/collector-metrics.go:97 +0xaa
github.com/urfave/cli/v2.(*Command).Run(0xc000021200, 0xc00002ca40)
/home/runner/work/scrutiny/scrutiny/vendor/github.com/urfave/cli/v2/command.go:164 +0x5bb
github.com/urfave/cli/v2.(*App).RunContext(0xc0001ac000, {0x9543b0?, 0xc000022098}, {0xc00001e0a0, 0x5, 0x5})
/home/runner/work/scrutiny/scrutiny/vendor/github.com/urfave/cli/v2/app.go:306 +0xbc5
github.com/urfave/cli/v2.(*App).Run(...)
/home/runner/work/scrutiny/scrutiny/vendor/github.com/urfave/cli/v2/app.go:215
main.main()
/home/runner/work/scrutiny/scrutiny/collector/cmd/collector-metrics/collector-metrics.go:182 +0x7d9
[root@esxi2:/vmfs/volumes/5f79c6f5-a7338bdc-85f3-6cb3114d162c/TEMP/smartmontools] /vmfs/volumes/MX1T/TEMP/smartmontools/scrutiny-collector-metrics-linux-amd64 run --debug --config ./collector.yaml
2022/11/01 02:02:36 No configuration file found at /opt/scrutiny/config/collector.yaml. Using Defaults.
___ ___ ____ __ __ ____ ____ _ _ _ _
/ __) / __)( _ \( )( )(_ _)(_ _)( \( )( \/ )
\__ \( (__ ) / )(__)( )( _)(_ ) ( \ /
(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
AnalogJ/scrutiny/metrics linux.amd64-0.5.0
2022/11/01 02:02:36 Loading configuration file: /vmfs/volumes/MX1T/TEMP/smartmontools/collector.yaml
DEBU[0000] json: unsupported type: map[interface {}]interface {} type=metrics
INFO[0000] Verifying required tools type=metrics
INFO[0000] Executing command: ./smartctl --scan --json type=metrics
ERRO[0000] Error scanning for devices: fork/exec ./smartctl: no space left on device type=metrics
2022/11/01 02:02:36 ERROR: fork/exec ./smartctl: no space left on device
Hey @sjtuross
fork/exec ./smartctl: no space left on device
is incredibly suspicious.
Can you confirm that the disk you're using has space available, and is writable?
I am certain there is space left on device and it's writable by ESXi. All VMs can read and write fine. Probably go app can't run properly on ESXi which is not real Linux.
could be related to this: http://woshub.com/vmware-esxi-no-space-left-device/
Did a bit more reading on this. Could also be related to missing swap volume or exhausted inodes
I'm going to close this issue for now, feel free to comment/open a new issue if you think theres something that I can fix in the Scrutiny codebase to support ESXI.
@AnalogJ I realized today that the no space left error is probably due to that smartctl can't find any devices. See below there is no devices listed in the output. Is there a way to disable scan and only use the devices specified in collector.yaml?
[root@esxi:/vmfs/volumes/5f79c6f5-a7338bdc-85f3-6cb3114d162c/TEMP/smartmontools] ./smartctl --scan --json
{
"json_format_version": [
1,
0
],
"smartctl": {
"version": [
7,
4
],
"pre_release": true,
"svn_revision": "5414",
"platform_info": "x86_64-linux-7.0.3",
"build_info": "(CircleCI)",
"argv": [
"smartctl",
"--scan",
"--json"
],
"exit_status": 0
}
}
huh, interesting.
You can customize the flags sent to the scan
command using https://github.com/AnalogJ/scrutiny/blob/master/example.collector.yaml#L81
Removing scan
completely would be a lot of work, however, if we can get it working on ESXI with some additional flags, I'd be happy to update our docs and/or add an ESXI troubleshooting guide.
Would you be willing to do some testing with the smartctl --scan
command with additional flags?
Yes, I'm willing to try. What additional flags do you suggest?
Here's a smartctl reference - https://linux.die.net/man/8/smartctl
and here's a couple of people using smartctl + an internal esxi tool to retrieve smart information:
It seems you/we may be able to write a wrapper around esxcli storage core device list
that returns results in a similar format as smartctl --scan --json
I find a way to format the output of esxcli storage core device list
, so it could be easier to parse it in a scan wrapper.
The csv formatter allows specifying some specific fields. In the below example, DevfsPath
is the device identifier. Filtering on Vendor=ATA can exclude USB and iSCSI devices.
[root@esxi:/vmfs/volumes/5f79c6f5-a7338bdc-85f3-6cb3114d162c/TEMP/smartmontools] esxcli --formatter=csv --format-param=fields="DevfsPath,Vendor,Model" storage core device list
DevfsPath,Vendor,Model,
/vmfs/devices/disks/naa.6589cfc000000f7b3137fe00cd6d09ca,FreeNAS ,iSCSI Disk ,
/vmfs/devices/disks/naa.6589cfc000000ff1bc98d51b11b6fdfa,TrueNAS ,iSCSI Disk ,
/vmfs/devices/disks/mpx.vmhba32:C0:T0:L0,SanDisk ,Cruzer Blade ,
/vmfs/devices/disks/naa.6589cfc000000b6136c98e7be4a2025f,FreeNAS ,iSCSI Disk ,
/vmfs/devices/disks/naa.6589cfc000000bf70912866cff56e19a,TrueNAS ,iSCSI Disk ,
/vmfs/devices/disks/t10.ATA_____WDC_WD180EMFZ2D11AFXA0___________________3WHDKL1J____________,ATA ,WDC WD180EMFZ-11,
[root@esxi:/vmfs/volumes/5f79c6f5-a7338bdc-85f3-6cb3114d162c/TEMP/smartmontools]
It's also possible to format the output as json, but --format-param is not supported.
esxcli --debug --formatter=json storage core device list
[
{
"AttachedFilters": [],
"DIXEnabled": false,
"DIXGuardType": "NO GUARD SUPPORT",
"DevfsPath": "/vmfs/devices/disks/naa.6589cfc000000f7b3137fe00cd6d09ca",
"Device": "naa.6589cfc000000f7b3137fe00cd6d09ca",
"DeviceMaxQueueDepth": 128,
"DeviceType": "Direct-Access ",
"DisplayName": "FreeNAS iSCSI Disk (naa.6589cfc000000f7b3137fe00cd6d09ca)",
"DriveType": "unknown",
"EmulatedDIXDIFEnabled": false,
"HasSettableDisplayName": true,
"IsBootDevice": false,
"IsBootUSBDevice": false,
"IsLocal": false,
"IsLocalSASDevice": false,
"IsOffline": false,
"IsPerenniallyReserved": false,
"IsPseudo": false,
"IsRDMCapable": true,
"IsRemovable": false,
"IsSAS": false,
"IsSSD": true,
"IsSharedClusterwide": true,
"IsUSB": false,
"IsVVOLPE": false,
"Model": "iSCSI Disk ",
"MultipathPlugin": "NMP",
"NoofoutstandingIOswithcompetingworlds": 32,
"NumberofPhysicalDrives": "unknown",
"OtherUIDs": [
"vml.010000000061633166366264386263343230303600695343534920"
],
"PIActivated": false,
"PIProtectionMask": "NO PROTECTION",
"PIType": 0,
"ProtectionEnabled": false,
"QueueFullSampleSize": 0,
"QueueFullThreshold": 0,
"RAIDLevel": "unknown",
"Revision": "0123",
"SCSILevel": 7,
"Size": 1048576,
"Status": "on",
"SupportedGuardTypes": [
"NO GUARD SUPPORT"
],
"ThinProvisioningStatus": "yes",
"VAAIStatus": "supported",
"Vendor": "FreeNAS "
},
{
"AttachedFilters": [],
"DIXEnabled": false,
"DIXGuardType": "NO GUARD SUPPORT",
"DevfsPath": "/vmfs/devices/disks/naa.6589cfc000000ff1bc98d51b11b6fdfa",
"Device": "naa.6589cfc000000ff1bc98d51b11b6fdfa",
"DeviceMaxQueueDepth": 128,
"DeviceType": "Direct-Access ",
"DisplayName": "TrueNAS iSCSI Disk (naa.6589cfc000000ff1bc98d51b11b6fdfa)",
"DriveType": "unknown",
"EmulatedDIXDIFEnabled": false,
"HasSettableDisplayName": true,
"IsBootDevice": false,
"IsBootUSBDevice": false,
"IsLocal": false,
"IsLocalSASDevice": false,
"IsOffline": false,
"IsPerenniallyReserved": false,
"IsPseudo": false,
"IsRDMCapable": true,
"IsRemovable": false,
"IsSAS": false,
"IsSSD": true,
"IsSharedClusterwide": true,
"IsUSB": false,
"IsVVOLPE": false,
"Model": "iSCSI Disk ",
"MultipathPlugin": "NMP",
"NoofoutstandingIOswithcompetingworlds": 32,
"NumberofPhysicalDrives": "unknown",
"OtherUIDs": [
"vml.010000000038303631356630613434663230313200695343534920"
],
"PIActivated": false,
"PIProtectionMask": "NO PROTECTION",
"PIType": 0,
"ProtectionEnabled": false,
"QueueFullSampleSize": 0,
"QueueFullThreshold": 0,
"RAIDLevel": "unknown",
"Revision": "0123",
"SCSILevel": 7,
"Size": 204800,
"Status": "on",
"SupportedGuardTypes": [
"NO GUARD SUPPORT"
],
"ThinProvisioningStatus": "yes",
"VAAIStatus": "supported",
"Vendor": "TrueNAS "
},
{
"AttachedFilters": [],
"DIXEnabled": false,
"DIXGuardType": "NO GUARD SUPPORT",
"DevfsPath": "/vmfs/devices/disks/mpx.vmhba32:C0:T0:L0",
"Device": "mpx.vmhba32:C0:T0:L0",
"DeviceMaxQueueDepth": 1,
"DeviceType": "Direct-Access ",
"DisplayName": "Local USB Direct-Access (mpx.vmhba32:C0:T0:L0)",
"DriveType": "unknown",
"EmulatedDIXDIFEnabled": false,
"HasSettableDisplayName": false,
"IsBootDevice": true,
"IsBootUSBDevice": true,
"IsLocal": true,
"IsLocalSASDevice": false,
"IsOffline": false,
"IsPerenniallyReserved": false,
"IsPseudo": false,
"IsRDMCapable": false,
"IsRemovable": true,
"IsSAS": false,
"IsSSD": false,
"IsSharedClusterwide": false,
"IsUSB": true,
"IsVVOLPE": false,
"Model": "Cruzer Blade ",
"MultipathPlugin": "NMP",
"NoofoutstandingIOswithcompetingworlds": 1,
"NumberofPhysicalDrives": "unknown",
"OtherUIDs": [
"vml.010000000032303034343331373431303535333430324345464372757a6572"
],
"PIActivated": false,
"PIProtectionMask": "NO PROTECTION",
"PIType": 0,
"ProtectionEnabled": false,
"QueueFullSampleSize": 0,
"QueueFullThreshold": 0,
"RAIDLevel": "unknown",
"Revision": "0103",
"SCSILevel": 2,
"Size": 15267,
"Status": "on",
"SupportedGuardTypes": [
"NO GUARD SUPPORT"
],
"ThinProvisioningStatus": "unknown",
"VAAIStatus": "unsupported",
"Vendor": "SanDisk "
},
{
"AttachedFilters": [],
"DIXEnabled": false,
"DIXGuardType": "NO GUARD SUPPORT",
"DevfsPath": "/vmfs/devices/disks/naa.6589cfc000000b6136c98e7be4a2025f",
"Device": "naa.6589cfc000000b6136c98e7be4a2025f",
"DeviceMaxQueueDepth": 128,
"DeviceType": "Direct-Access ",
"DisplayName": "FreeNAS iSCSI Disk (naa.6589cfc000000b6136c98e7be4a2025f)",
"DriveType": "unknown",
"EmulatedDIXDIFEnabled": false,
"HasSettableDisplayName": true,
"IsBootDevice": false,
"IsBootUSBDevice": false,
"IsLocal": false,
"IsLocalSASDevice": false,
"IsOffline": false,
"IsPerenniallyReserved": false,
"IsPseudo": false,
"IsRDMCapable": true,
"IsRemovable": false,
"IsSAS": false,
"IsSSD": true,
"IsSharedClusterwide": true,
"IsUSB": false,
"IsVVOLPE": false,
"Model": "iSCSI Disk ",
"MultipathPlugin": "NMP",
"NoofoutstandingIOswithcompetingworlds": 32,
"NumberofPhysicalDrives": "unknown",
"OtherUIDs": [
"vml.010000000030303063323966653034646230310000695343534920"
],
"PIActivated": false,
"PIProtectionMask": "NO PROTECTION",
"PIType": 0,
"ProtectionEnabled": false,
"QueueFullSampleSize": 0,
"QueueFullThreshold": 0,
"RAIDLevel": "unknown",
"Revision": "0123",
"SCSILevel": 7,
"Size": 4194304,
"Status": "on",
"SupportedGuardTypes": [
"NO GUARD SUPPORT"
],
"ThinProvisioningStatus": "yes",
"VAAIStatus": "supported",
"Vendor": "FreeNAS "
},
{
"AttachedFilters": [],
"DIXEnabled": false,
"DIXGuardType": "NO GUARD SUPPORT",
"DevfsPath": "/vmfs/devices/disks/naa.6589cfc000000bf70912866cff56e19a",
"Device": "naa.6589cfc000000bf70912866cff56e19a",
"DeviceMaxQueueDepth": 128,
"DeviceType": "Direct-Access ",
"DisplayName": "TrueNAS iSCSI Disk (naa.6589cfc000000bf70912866cff56e19a)",
"DriveType": "unknown",
"EmulatedDIXDIFEnabled": false,
"HasSettableDisplayName": true,
"IsBootDevice": false,
"IsBootUSBDevice": false,
"IsLocal": false,
"IsLocalSASDevice": false,
"IsOffline": false,
"IsPerenniallyReserved": false,
"IsPseudo": false,
"IsRDMCapable": true,
"IsRemovable": false,
"IsSAS": false,
"IsSSD": true,
"IsSharedClusterwide": true,
"IsUSB": false,
"IsVVOLPE": false,
"Model": "iSCSI Disk ",
"MultipathPlugin": "NMP",
"NoofoutstandingIOswithcompetingworlds": 32,
"NumberofPhysicalDrives": "unknown",
"OtherUIDs": [
"vml.010000000038303631356630613434663230313100695343534920"
],
"PIActivated": false,
"PIProtectionMask": "NO PROTECTION",
"PIType": 0,
"ProtectionEnabled": false,
"QueueFullSampleSize": 0,
"QueueFullThreshold": 0,
"RAIDLevel": "unknown",
"Revision": "0123",
"SCSILevel": 7,
"Size": 3145728,
"Status": "on",
"SupportedGuardTypes": [
"NO GUARD SUPPORT"
],
"ThinProvisioningStatus": "yes",
"VAAIStatus": "supported",
"Vendor": "TrueNAS "
},
{
"AttachedFilters": [],
"DIXEnabled": false,
"DIXGuardType": "NO GUARD SUPPORT",
"DevfsPath": "/vmfs/devices/disks/t10.ATA_____WDC_WD180EMFZ2D11AFXA0___________________3WHDKL1J____________",
"Device": "t10.ATA_____WDC_WD180EMFZ2D11AFXA0___________________3WHDKL1J____________",
"DeviceMaxQueueDepth": 31,
"DeviceType": "Direct-Access ",
"DisplayName": "Local ATA Disk (t10.ATA_____WDC_WD180EMFZ2D11AFXA0___________________3WHDKL1J____________)",
"DriveType": "unknown",
"EmulatedDIXDIFEnabled": false,
"HasSettableDisplayName": true,
"IsBootDevice": false,
"IsBootUSBDevice": false,
"IsLocal": true,
"IsLocalSASDevice": false,
"IsOffline": false,
"IsPerenniallyReserved": false,
"IsPseudo": false,
"IsRDMCapable": false,
"IsRemovable": false,
"IsSAS": false,
"IsSSD": false,
"IsSharedClusterwide": false,
"IsUSB": false,
"IsVVOLPE": false,
"Model": "WDC WD180EMFZ-11",
"MultipathPlugin": "HPP",
"NoofoutstandingIOswithcompetingworlds": 31,
"NumberofPhysicalDrives": "unknown",
"OtherUIDs": [
"vml.0100000000335748444b4c314a202020202020202020202020574443205744"
],
"PIActivated": false,
"PIProtectionMask": "NO PROTECTION",
"PIType": 0,
"ProtectionEnabled": false,
"QueueFullSampleSize": 0,
"QueueFullThreshold": 0,
"RAIDLevel": "unknown",
"Revision": "0A81",
"SCSILevel": 5,
"Size": 17166336,
"Status": "on",
"SupportedGuardTypes": [
"NO GUARD SUPPORT"
],
"ThinProvisioningStatus": "unknown",
"VAAIStatus": "unsupported",
"Vendor": "ATA "
}
]
Reference:
Any update on this?
If you're willing, this could be a good example of a custom collector - https://github.com/AnalogJ/scrutiny/tree/240178d742a5fe84b5b61952897a855f9425b790/collector/cmd
I had the same problem and I attempted to do the following:
Edit collector.yaml
:
commands:
metrics_smartctl_bin: 'uname' # change to `uname` for testing
metrics_scan_args: '-a --json' # --json required by scrutiny
Result:
[root@esxi:/tmp] scrutiny-collector run --config collector.yaml --debug
2024/02/14 08:33:34 No configuration file found at /opt/scrutiny/config/collector.yaml. Using Defaults.
___ ___ ____ __ __ ____ ____ _ _ _ _
/ __) / __)( _ \( )( )(_ _)(_ _)( \( )( \/ )
\__ \( (__ ) / )(__)( )( _)(_ ) ( \ /
(___/ \___)(_)\_)(______) (__) (____)(_)\_) (__)
AnalogJ/scrutiny/metrics linux.amd64-0.7.2
2024/02/14 08:33:34 Loading configuration file: /tmp/collector.yaml
DEBU[0000] {
"api": {
"endpoint": "http://*****:8080"
},
"commands": {
"metrics_info_args": "--info --json",
"metrics_scan_args": "-a --json",
"metrics_smart_args": "--xall --json",
"metrics_smartctl_bin": "uname"
},
"devices": [
/* Non-critical devices content is omitted */
{
"device": "/vmfs/devices/disks/t10.ATA_____INTEL***__",
"type": "sat"
},
],
"host": {
"id": "*******"
},
"log": {
"file": "",
"level": "DEBUG"
},
"version": 1
}<nil> type=metrics
INFO[0000] Verifying required tools type=metrics
INFO[0000] Executing command: uname -a --json type=metrics
ERRO[0000] Error scanning for devices: fork/exec /bin/uname: no space left on device type=metrics
2024/02/14 08:33:34 ERROR: fork/exec /bin/uname: no space left on device
[root@esxi:/tmp] uname
VMkernel
Obviously, uname
also return ERROR: fork/exec /bin/uname: no space left on device
Although it seems a bit strange, let's take a look at the code:
detectedDeviceConnJson, err := d.Shell.Command(d.Logger, d.Config.GetString("commands.metrics_smartctl_bin"), args, "", os.Environ())
if err != nil {
d.Logger.Errorf("Error scanning for devices: %v", err)
return nil, err
}
Execution of the code terminates here, the following code is not executed (if it is executed it should report that it is not a valid JSON format)
var detectedDeviceConns models.Scan
err = json.Unmarshal([]byte(detectedDeviceConnJson), &detectedDeviceConns)
if err != nil {
d.Logger.Errorf("Error decoding detected devices: %v", err)
return nil, err
}
What my idea is that the ESXi security features don't allow fork/exec operations. (To minimally reproduce this guess we can code a Golang program that simply executes a command and then looks at the behavior in ESXi.)
Environment: ESXi 8.x
Describe the bug I got smartctl from smartmontools-linux-x86_64-static-7.4-r5414.tar.gz which is the latest CI build as of now from https://builds.smartmontools.org and put scrutiny-collector-metrics-linux-amd64 and collector.yaml in the same folder. Did some quick check below and all looked good.
This is
collector.yaml
However, when I run
./scrutiny-collector-metrics-linux-amd64 run --config ./collector.yaml
, it crashed with the log below.BTW, this is the ESXi kernel version.
[root@esxi2:/vmfs/volumes/5f79c6f5-a7338bdc-85f3-6cb3114d162c/TEMP/smartmontools] uname -a VMkernel esxi2 7.0.3 #1 SMP Release build-20328353 Aug 22 2022 19:41:06 x86_64 x86_64 x86_64 ESXi
Expected behavior collector runs on ESXi 7.
Screenshots If applicable, add screenshots to help explain your problem.
Log Files