elastic / gosigar

Gathers system and per process statistics
Apache License 2.0
105 stars 76 forks source link

Handle ERROR_INVALID_PARAMETER error for GetVolumeInformationW win32 api #164

Closed narph closed 3 years ago

narph commented 3 years ago

similar to https://github.com/elastic/gosigar/pull/159 GetVolumeInformation will fail for external drives like CD-ROM or other type with error codes as ERROR_NOT_READY. ERROR_INVALID_FUNCTION, ERROR_INVALID_PARAMETER and could be others.

The PR will report error in the logs but ignore it and move further. We consider retrieving filesystem type best efforts. Since we introduced this calculation we are getting too many issues related to external drives, so we need to decide on one of the following behaviors:

@jsoriano have we dealt with anything similar before? @fearful-symmetry which option would you go for?

elasticmachine commented 3 years ago

:green_heart: Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

#### Build stats * Start Time: 2021-11-02T12:50:38.843+0000 * Duration: 8 min 12 sec * Commit: 7cf3d6c2129924dcd694e423400a4f46324b91f3 #### Test stats :test_tube: | Test | Results | | ------------ | :-----------------------------: | | Failed | 0 | | Passed | 261 | | Skipped | 7 | | Total | 268 |

jsoriano commented 3 years ago

I think the error in this case can be completely ignored (third option). I would prefer that than to report it here but ignore it in beats. Ignoring it in beats can lead to other problems, as actual errors in other platforms being ignored by mistake. So I would prefer to make the platform-specific decision here.

Not reporting the filesystem for drives without a filesystem is probably fine in these cases, I guess this can even happen with a hard disk without format or partition table?

There would be a couple of things to think about:

narph commented 3 years ago

@jsoriano , thanks for the quick feedback

Same here, third option gives us less headache, as we would have to single out this specific error from the rest and ignore it on the metricset side, ex:

if !strings.Contains(err.Error(), "GetVolumeInformationW") {
            return nil, err
        }

not a big fan of string comparison.

What value to report. No value or empty value sgtm, but maybe we want to put something like unknown or unavailable?

I like the idea and I would go for unavailable since this is the main issue.

Do we have the complete list of known errors? This way we can ignore the errors that may happen when the filesystem is not available, but we report other issues. Though this may add complexity and provide little value.

This is how I fixed the last scenario https://github.com/elastic/gosigar/blob/master/sys/windows/syscall_windows.go#L356, but now we have a new error type ERROR_INVALID_PARAMETER which is most likely caused by the external drives. These are system error codes and they are very broad: each one can occur in one of many hundreds of locations in the system, also the list is large.

jsoriano commented 3 years ago

What value to report. No value or empty value sgtm, but maybe we want to put something like unknown or unavailable?

I like the idea and I would go for unavailable since this is the main issue.

Btw, I was checking what is done in Linux for this case, and I found that the behaviour is different. On Linux sigar only collects mounted filesystems, so all of them have a filesystem type. While in Windows it collects all drives, mounted or not.