elastic / elastic-package

elastic-package - Command line tool for developing Elastic Integrations
Other
50 stars 116 forks source link

Validate fields skipped in type assertion #147

Open mtojek opened 4 years ago

mtojek commented 4 years ago

Follow-up on: https://github.com/elastic/elastic-package/pull/143#discussion_r511698446 https://github.com/elastic/elastic-package/pull/143#issuecomment-712994535

Currently we skip validation for generic fields present in every (most?) document collected by Elastic Agents but undefined in fields.yml files (not part of integrations). Sample fields: agent.*, elastic_agent.*.

The goal is to introduce validation for the special fields as well.

mtojek commented 4 years ago

See: https://beats-ci.elastic.co/blue/organizations/jenkins/Beats%2Felastic-package/detail/PR-143/35/tests/

one or more errors found in documents stored in metrics-apache.status-ep data stream: [0] field "cloud.availability_zone" is not defined
[1] field "cloud.instance.id" is not defined
[2] field "cloud.instance.name" is not defined
[3] field "cloud.machine.type" is not defined
[4] field "cloud.project.id" is not defined
[5] field "cloud.provider" is not defined
[6] field "event.duration" is not defined
[7] field "event.module" is not defined
[8] field "host.architecture" is not defined
[9] field "host.containerized" is not defined
[10] field "host.hostname" is not defined
[11] field "host.ip" is not defined
[12] field "host.mac" is not defined
[13] field "host.name" is not defined
[14] field "host.os.codename" is not defined
[15] field "host.os.family" is not defined
[16] field "host.os.kernel" is not defined
[17] field "host.os.name" is not defined
[18] field "host.os.platform" is not defined
[19] field "host.os.version" is not defined
[20] field "service.address" is not defined
one or more errors found in documents stored in logs-apache.access-ep data stream: [0] field "cloud.instance.id" is not defined
[1] field "cloud.instance.name" is not defined
[2] field "cloud.provider" is not defined
[3] field "event.dataset" is not defined
[4] field "event.outcome" is not defined
[5] field "host.architecture" is not defined
[6] field "host.containerized" is not defined
[7] field "host.hostname" is not defined
[8] field "host.ip" is not defined
[9] field "host.mac" is not defined
[10] field "host.name" is not defined
[11] field "host.os.codename" is not defined
[12] field "host.os.family" is not defined
[13] field "host.os.kernel" is not defined
[14] field "host.os.name" is not defined
[15] field "host.os.platform" is not defined
[16] field "host.os.version" is not defined
[17] field "input.type" is not defined
one or more errors found in documents stored in logs-apache.error-ep data stream: [0] field "cloud.availability_zone" is not defined
[1] field "cloud.instance.id" is not defined
[2] field "cloud.instance.name" is not defined
[3] field "cloud.machine.type" is not defined
[4] field "cloud.project.id" is not defined
[5] field "cloud.provider" is not defined
[6] field "event.category" is not defined
[7] field "event.kind" is not defined
[8] field "event.timezone" is not defined
[9] field "event.type" is not defined
[10] field "host.architecture" is not defined
[11] field "host.containerized" is not defined
[12] field "host.hostname" is not defined
[13] field "host.ip" is not defined
[14] field "host.mac" is not defined
[15] field "host.name" is not defined
[16] field "host.os.codename" is not defined
[17] field "host.os.family" is not defined
[18] field "host.os.kernel" is not defined
[19] field "host.os.name" is not defined
[20] field "host.os.platform" is not defined
[21] field "host.os.version" is not defined
[22] field "input.type" is not defined
mtojek commented 3 years ago

Looking at this issue, it seems to be a good candidate to implement while we will introduce support for processors' fields in https://github.com/elastic/package-spec/issues/63 .

mtojek commented 3 years ago

Related: https://github.com/elastic/package-spec/issues/199

mtojek commented 2 years ago

This issue is relatively old now, so I assume it isn't really important. I will close it for now.

andrewkroh commented 2 years ago

I think we need to revisit the check for event.*. There have been many reported issues due to a lack of mappings for various event fields. I did not realize this wasn't being validated.

https://github.com/elastic/elastic-package/blob/a4a0b19d768eee14582acfb58a816823110a8980/internal/fields/validate.go#L266

mtojek commented 2 years ago

I think we need to revisit the check for event.*. There have been many reported issues due to a lack of mappings for various event fields. I did not realize this wasn't being validated.

I remember that we disabled them because it means that we have to add more exactly the same fields to every data stream. We tried to figure out an option to introduce common fields and the topic was deprioritized. I agree that's a high time to refresh the discussion :)

cc @jsoriano @jlind23

jsoriano commented 2 years ago

Reopening as the skip is still in the code, and there seems to be related issues that could be detected by the skipped validation.

andrewkroh commented 2 years ago

I wanted to mention that there are two fields under event that we probably do want to skip validating because Fleet automatically adds a mapping for them. Those are "event.agent_id_status" and "event.ingested".

Perhaps something like this:

diff --git a/internal/fields/validate.go b/internal/fields/validate.go
index afed354..f149e57 100644
--- a/internal/fields/validate.go
+++ b/internal/fields/validate.go
@@ -263,16 +263,29 @@ func skipValidationForField(key string) bool {
    return isFieldFamilyMatching("agent", key) ||
        isFieldFamilyMatching("elastic_agent", key) ||
        isFieldFamilyMatching("cloud", key) || // too many common fields
-       isFieldFamilyMatching("event", key) || // too many common fields
        isFieldFamilyMatching("host", key) || // too many common fields
        isFieldFamilyMatching("metricset", key) || // field is deprecated
-       isFieldFamilyMatching("event.module", key) // field is deprecated
+       isFleetManagedMapping(key)
 }

 func isFieldFamilyMatching(family, key string) bool {
    return key == family || strings.HasPrefix(key, family+".")
 }

+// isFleetManagedMapping return true if the field is contained in a Fleet
+// managed component template that is added to all data streams.
+//
+// The template is controlled in elastic/kibana at:
+// https://github.com/elastic/kibana/blob/v8.1.0/x-pack/plugins/fleet/server/constants/fleet_es_assets.ts#L24-L39
+func isFleetManagedMapping(key string) bool {
+   switch key {
+   case "event.agent_id_status",
+       "event.ingested":
+       return true
+   }
+   return false
+}
+
 func isFieldTypeFlattened(key string, fieldDefinitions []FieldDefinition) bool {
    definition := FindElementDefinition(key, fieldDefinitions)
    return definition != nil && definition.Type == "flattened"
jsoriano commented 2 years ago

Gave a try to this in the context of elastic/package-spec#399, but I think that we need some kind of centralized schema for data providers (https://github.com/elastic/package-spec/issues/199). So Fleet can install the mappings for these providers, and elastic-package can also query this data for validations of these fields.