aws / aws-xray-sdk-python

AWS X-Ray SDK for the Python programming language
Apache License 2.0
329 stars 143 forks source link

Support http url characters as trace entity names per RFC#1738 #57

Open trondhindenes opened 6 years ago

trondhindenes commented 6 years ago

I'm getting the following, running aws-xray-sdk==1.0:

Removing Segment/Subsugment Name invalid characters from https://my.elasticsearch.cluster/iis_log_prod-*/_search. Elasticsearch does use a bit weird url naming, but it AFAIK they're fully conformant.

For info, i'm not manually naming xray-subsegments, but I'm assuming this happens under the cover when I use the requests package to invoke url requests.

haotianw465 commented 6 years ago

Hi, you can see the common invalid characters here https://github.com/aws/aws-xray-sdk-python/blob/master/aws_xray_sdk/core/models/entity.py#L18. It looks like your cluster url has a * or if you replaced some sensitive information with a * when pasting the log entry.

Please let me know if non of the characters in your url is on that list and what character is being dropped based on the trace data you get back from the X-Ray service so we can take a further look.

trondhindenes commented 6 years ago

Nope, that is the actual URL. Elasticsearch allows wildcards in the index pattern part of the url.

trondhindenes commented 6 years ago

I couldn't find an official http spec from the top of my mind but allowed (and considered safe) url characters is something like:

"Alphanumerics [0-9a-zA-Z], special characters $-_.+!*'(),, and reserved characters used for their reserved purposes (e.g., question mark used to denote a query string)"

I would expect the x-ray tracer to support this.

haotianw465 commented 6 years ago

Hi,

We will put this as a feature request in our backlog and look into support all valid http url characters in segment names. At the meantime I'm assuming you are getting https://my.elasticsearch.cluster/iis_log_prod-/_search back for this subsegment. Does this works for you now or you need some workaround to rename it with something like a user defined fallback?

Another thing is that we will not consider supporting ? for now. Our SDKs don't capture any query string as it could contain sensitive information. We will strip ? and anything after it in a url. See https://github.com/aws/aws-xray-sdk-python/blob/master/aws_xray_sdk/ext/util.py#L118.

Please let me know if you have any additional concerns.

trondhindenes commented 6 years ago

Thanks! The bigger part of this problem for me right now is that the warnings are flooding our Lambda logs, making it very hard to find "real" log data in the functions with x-ray enabled. I guess there's some setting where I can toggle hiding warnings, but that also seems kinda wrong :thinking: .

I completely understand the choice to strip out the querystring part of the request, that makes total sense.