ben-manes / caffeine

A high performance caching library for Java
Apache License 2.0
15.64k stars 1.58k forks source link

Outdated documentation on how to get traces automatically #1740

Closed adriacabeza closed 1 month ago

adriacabeza commented 1 month ago

Hi šŸ‘‹šŸ¾ !

From reading the documentation, it is clear that there was previously a method for automatically capturing traces to facilitate running the simulator on traced data, as outlined here: https://github.com/ben-manes/caffeine/wiki/Tracing. Is it still the case? According to the discussion in this issue: https://github.com/ben-manes/caffeine/issues/105#issuecomment-238101274, this feature was deprecated and subsequently removed due to lack of usage. Should the entry be removed from the wiki?

I am interested into investigating different policies so I am interested into getting my own traces in prod. From my understanding, it appears that merely storing the access events (i.e. its hashKey) would suffice, right? From what I can see it simply needs to implement a Stream<AccessEvent>.

Could you provide insight into the rationale for removing this feature? Additionally, is there any specific reason why the documentation does not cover this aspect in detail? Imo having a simple default way to get them (even if its just docs) would be a pretty helpful. Happy to help about it!

Thanks.

ben-manes commented 1 month ago

hmm, well the wiki page wasnā€™t linked to in the sidebar, so it didnā€™t seem necessary to actually delete it. Any reason you came across it? Itā€™s in the raw list of pages but I didnā€™t think anyone would go through that, so there are a few outdated pages not removed. I suppose we could delete them if actually noticed.

The feature was removed as traces were rarely captured by users and none used our utility to do so. A challenge with some methods like Map.compute is itā€™s both a read and a write, so classifying it in stats might be wrong based on the usage. Similarly then trying to write a trace becomes hard when you donā€™t know how to interpret the intent at a library level.

It was easier to instead recommend appending to a logger that emits to its own file so as to not disturb the application log. Thatā€™s already integrated into end userā€™s infra, can be asynchronous, etc. There wasnā€™t a good reason to reinvent that and deal with the complexity of our api generalizations.

The simulator can easily be extended to read new file formats so whatever is easiest to capture is fine. The key hash is enough and the parser can convert that to the event stream. If you have any difficulties or want to share your traces then let me know.

adriacabeza commented 1 month ago

I discovered the page via a Google search, so I had no indication that the information was outdated. It would be helpful to remove or update pages that are no longer relevant. If a complete review of all content is too cumbersome, perhaps starting with the removal of that specific page could be a worthwhile first step šŸ˜„.

A challenge with some methods like Map.compute is itā€™s both a read and a write, so classifying it in stats might be wrong based on the usage.

That makes a lot of sense. I will add my own logger and get back if I have any difficulties. Thanks!