agardnerIT / tracepusher

Generate and push OpenTelemetry Trace data to an OTEL collector in JSON format
Apache License 2.0
54 stars 11 forks source link

Feature request: Flexibility in time calculation/Unit/Define timestamp of start/end #51

Closed Dant3s closed 1 year ago

Dant3s commented 1 year ago

Use case:

Log for a system based in ancient language, imposible to instrument in any kind of tracing system. Each entry of the log represent the start of a transtacion/span and later the end.

Today the duration is based in seconds, but for this kind of situation of quick based tx (ms as max value) or "true to the source" requirment, the timestamp of the original span/transaction in format "%Y-%m-%d %H:%M:%S.%f" could be used for start/end without the need of calculation.

Checking the span configuration the units defined are in unix_nano. Could be usefull to just pass the timestamp and do the convertion only to unix_time, without a need to pass a duration and leaving the start/end as an input in case it exists.

Possible solutions:

agardnerIT commented 1 year ago

Hi, thanks of the feature request. So you need the ability to be more granular than just "seconds" for the duration? Are the "transactions" in your system "one-off" events that could be described as a single log line entry or are they "multi-step" where a "parent" starts at time 0 then 1 or more child things happen within (or because of) the parent and then the whole things ends?

Have you looked at logpusher?

Do you really need sub-second accuracy or would "to the nearest second" be enough?

Dant3s commented 1 year ago

Hey! Each line could be a transaction or span. But is defined as If the line is a new transaction then all the other lines are spans until the close of the current transaction, and each span could have it's own start and maybe an end. The logic for this was already solved it with a python script to "understand" the start of the transaction and spans. So there is not need to do anything there with tracepusher. Only invoke it when i want to send the data (at the end of the span/transaction.

Problem for the second accurancy is mostly since all / or most spans are fast (ms differations) most of them end with the same second roundup and also, that the transaction start time is actually created not in the "real" start time, but with the calculated time based in the current time of the tracepusher execution plus the delta of the seconds duration, that could be round up to 0 after my testings.

The idea of this is actually process transactions in wich the only information resides in logs lines and conver it to traces/spans for a more "in the line of view" of the information that represents and move it from the concepto of a simple log. That is why the logpusher wasn't considerer as an option.

Now honestly, since this was my request I decided to peak a little into tracepusher and during the night i hacked a little bit to allow such use cases:

The code is a little hacky, since it's been a while since I did python, need to clean a little. But if is up to your standar i could request a merge and add those capabilites for anyone else that might be interested in them.. (not right away).

Also a quick question... how you see something like tracepusher in a production enviroment?

agardnerIT commented 1 year ago

Thanks for the background. It seems like the simplest option is to add a new parameter called (something like) duration_type which will default to seconds if not set. This would maintain backwards compatibility. But if you could specify:

./tracepusher -dur 3500 -duration-type ms

They tracepusher would know you're sending milliseconds.

Would this solve things?


wrt to tracepusher in a production environment, I don't see any issue.


https://github.com/agardnerIT/tracepusher/blob/cfec80193bd0d4a2770989a32f941c53670af406/tracepusher.py#L271

Adding a new optional input parameter for span type should also be no problem. Again, if not specified this should default to the current behaviour of SPAN_KIND_INTERNAL.

https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto#L121-L147

It should be expected that valid values are:


If these two changes:

Would solve your use case, LMK and I'll raise them as enhancements.

Dant3s commented 1 year ago

Yes, that will do the trick!

agardnerIT commented 1 year ago

Can you try running tracepusher with the binaries built here: https://github.com/agardnerIT/tracepusher/tree/add-duration-type/dist

The new parameters are:

For example, to push a CONSUMER span with a duration of 1234 milliseconds:

./tracepusher -ep http://localhost:4318 -sen serviceA -spn span1 -dur 1234 -dt ms

LMK how you go. If these binaries work for you, I'll include in main.

agardnerIT commented 1 year ago

@Dant3s did you have a chance to try this out?

Dant3s commented 1 year ago

Hey @agardnerIT Yes, sorry for the late reply, I tested without my changes and this does align with the idea! The output is in the MS is so sweet! Binary is also a nice solution!

agardnerIT commented 1 year ago

Perfect. I'm closing this as this is available in main now and will be "official" as of 0.8.0.

Docs: