Only store defined attributes from `context` in ES.

simitt commented 6 years ago

Json validation allows agents to send up additional information without failing. Ensure that only the defined information is stored in ES, except for context.custom and context.tags where we specifically allow for additional values.

roncohen commented 6 years ago

We've previously decided to keep context open in order to give us the freedom to dynamically add namespaces under context in agents if needed, without requiring an upgrade to the Elastic Stack. I believe this is an edge case, but lets keep it open for 6.2 and then consider locking it down for 6.3.

simitt commented 6 years ago

@roncohen should we look into this now for 6.3?

simitt commented 6 years ago

@roncohen can we please revisit this for 6.3 as we realized that e.g. @jahtalab is sending additional data within context for the frontend agent. It might be legit to do so but we should discuss this, as for example right now we would store the userAgent information additionally there, although there is already another defined place in the context to store this information.

In case we decide to add more defined attributes in the future that should be indexed, this would be a breaking change afaict, as it could break existing indices if the new attribute was stored with a different type.

Also we might end up with a completely different schema for different agents.

simitt commented 6 years ago

update: as those fields won't get indexed, changes to the schema should not break anything. However this could still lead to a different data structure in ES than defined or expected.

watson commented 6 years ago

From an agent developers point of view, I'm ok with locking down the properties under context.

From time to time we need a new root property under context. But in those cases we probably need to involve the APM UI team anyway to make sure it's displayed as expected. So I'm fine with adding that overhead.

felixbarny commented 6 years ago

Especially with the unauthorized endpoint for the RUM agent, a malicious user could send arbitrary data and flood ES. This is a general problem we might want to address with rate limiting, but we might also want to limit which types of fields can be sent. Together with an upper limit for the values (for example 1024 chars), 50 stacktrace elements etc. we effectively have an upper limit for one span, which probably helps.

simitt commented 6 years ago

@roncohen can we revisit if this is a feature or a bug for V2, and then decide and close this issue.

roncohen commented 5 years ago

thanks @simitt. @jahtalab which additional information are you sending today?

hmdhk commented 5 years ago

@roncohen ,

Currently we're sending the following additional fields:

{
  page: { referer: "",  url: "" }
}

I'm ok with locking down the context and/or setting limits on the values.

simitt commented 5 years ago

I suggest all @elastic/apm-agent-devs are listing the currently added attributes, so we can formalize them on the JSON schema and then stop processing additional data sent within context. @alvarolobato, @roncohen can we please move forward with this.

roncohen commented 5 years ago

sounds good to me

mikker commented 5 years ago

No added attributes in Ruby agent.

felixbarny commented 5 years ago

No added attributes in the Java agent.

axw commented 5 years ago

None for Go.

alvarolobato commented 5 years ago

We are going to aim to do this for V2, but needs more investigation and probably will also be applied to V1. This isn't considered a blocker for V2 GA.

hmdhk commented 5 years ago

I need to add these additional fields to the span context: note: the http context is already defined with url as it's only field.

{
            "http": {
                "method": "GET",
                "sync": true,
                "status_code": 200
            }
}

I think we should start adding these additional fields to the v1 and v2 schemas for documentation purposes even though this is not enforced yet. Should I make PR?

Also a general question, will the the whole request fail if there are additional fields or will they just not get stored?

simitt commented 5 years ago

@jahtalab feel free to open a PR on this.

In case additional fields are sent the server will just ignore them, but process the request otherwise. This is already the case for all other fields, to keep the upgrade path easy. (Otherwise the server would need to be updated before the agents can update).

jalvz commented 5 years ago

We are going to aim to do this for V2

Trying to catch up - what is the current status, and how is this going to look like with ECS?

simitt commented 5 years ago

We haven't implemented that yet, but according to https://github.com/elastic/apm-server/issues/411#issuecomment-422024597 it would be fine to add it if we manage in time.

simitt commented 5 years ago

@watson @Qard @beniwohli are you sending any additional fields in context in v1 or v2 that we need to add to context when locking it?

simitt commented 5 years ago

@jahtalab is this suggestion https://github.com/elastic/apm-server/issues/411#issuecomment-423990546 aligned and ready for implementation?

hmdhk commented 5 years ago

@simitt , @eyalkoren wanted to add a similar data structure, so now we're discussing this here.

watson commented 5 years ago

@simitt The Node.js agent adds a few fields under context.custom. But nothing in the root of context

hmdhk commented 5 years ago

Referencing Add sync property to spans issue.

simitt commented 5 years ago

We decided in the server meeting to not move forward with this for now but postpone it, as this could be considered to be a breaking change.

felixbarny commented 5 years ago

Makes sense to move to the 7.0 milestone then?

jalvz commented 5 years ago

@roncohen @alvarolobato I would appreciate a decision on this to either close the issue or plan/prioritize 7.0 work accordingly

roncohen commented 5 years ago

thanks @jalvz

Reading through the issue, I'm just making sure: Is the intention to change the server so that it would reject unknown fields under context or just that we should stop storing them?

I'm fine to stop storing them, but I think rejection could be problematic. We're also not rejection unknown fields anywhere on the root, for example, so it would be strange to start doing it here.

simitt commented 5 years ago

In case additional fields are sent the server will just ignore them, but process the request otherwise. This is already the case for all other fields, to keep the upgrade path easy. (Otherwise the server would need to be updated before the agents can update).

Thanks for the update @roncohen. This issue is about stop storing unknown attributes, not rejecting them. We are trying to get this in for 7.0 then (cc @elastic/apm-agent-devs ).

simitt commented 5 years ago

@elastic/apm-agent-devs as discussed offline, the plan is to NOT store additional, not defined attributes sent up by the agents from 7.0 on. If there are no concerns or counter proposals raised we will move forward with this implementation.

simitt commented 5 years ago

We decided to only store objects defined in the json spec for the Intake API sent under the key context from 7.0 on. On ES level we don't store any information within context anymore, that is defined on Intake API level. Therefore the only necessary change left is to ensure context itself is not stored on ES level.

elastic / apm-server

Only store defined attributes from `context` in ES. #411