argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.93k stars 3.18k forks source link

Examine impact of string interning on memory usage #13684

Open isubasinghe opened 6 days ago

isubasinghe commented 6 days ago

Summary

Go 1.23 introduced string interning (https://pkg.go.dev/unique)

We use strings as enums all throughout the argo codebase, which in itself is quite alright, however they might take up a reasonable amount of memory given the amount of code/copying we have, in addition to this string copy (and the subsequent garbage collection needed) is not cheap (however in the context of argo workflows, the compute time for other work would easily eclipse the compute needed for string copying and garbage collection).

Therefore this issue is low priority, but this is a relatively cheap low hanging fruit that might reduce memory usage by 0.5-1%.

It might be a good first issue to explore around the codebase.

agilgur5 commented 6 days ago

Go 1.23 introduced string interning (https://pkg.go.dev/unique)

Did you mean to link to https://go.dev/blog/unique? The GoDoc is quite sparse and hard to grok without a usage example, which the blog post gives.

We use strings as enums all throughout the argo codebase

I'm also a little surprised that the compiler doesn't automatically handle some more static cases like enums. But I guess not that surprised these days after #12228 and similar needs etc 🙃

It might be a good first issue to explore around the codebase.

From a quick glance, I think this would be too complex (effectively implementing a cache, plus requires a bit of grokking theory for interning/symbols and generics) and open ended for a "good first issue". We've previously discussed (Feb 20 Contributor Meeting) that a "good first issue" must first have a well defined solution with an ironed out approach (solution/suggested) from a contributor and additionally be straightforward to implement for a beginner. This is currently neither IMO.

isubasinghe commented 6 days ago

Did you mean to link to https://go.dev/blog/unique? The GoDoc is quite sparse and hard to grok without a usage example, which the blog post gives.

Tbh no I think its sparse but clear enough given there is only one function, calling Make will return you the interned handle (creating a handle if not present in the internal memory pool). I found it simpler to understand than the blog.

I'm also a little surprised that the compiler doesn't automatically handle some more static cases like enums. But I guess not that surprised these days after https://github.com/argoproj/argo-workflows/issues/12228 and similar needs etc 🙃

The Go compiler generally focuses on speed of compilation more I think, which makes sense given it was written for Google. Go does intern compile time strings though.

From a quick glance, I think this would be too complex (effectively implementing a cache, plus requires a bit of grokking theory for interning/symbols and generics) and open ended for a "good first issue".

I disagree on this one, you don't need to implement a cache at all, you just need to call unique.Make everywhere you create a new enum value, the cache has already been implemented for you by the golang stdlib. No need to worry about generics as well since the generic parameter will just be string.

isubasinghe commented 6 days ago

That being said, this issue is vague on what it tells (potential) authors to do, I would be happy to provide more explicit directions.

agilgur5 commented 6 days ago

The Go compiler generally focuses on speed of compilation more I think

It's relatively slow in my usage in Argo at least too 😕 A good chunk of that seems to be due to how it handles all the deps though. Have pretty much done https://xkcd.com/303/ before

the cache has already been implemented for you by the golang stdlib

Ah I see, that's the implementation the blog post covered which the library does for you. I thought it'd be more complex, particularly for repetitive non-enums. Thanks for clarifying that!

That being said, this issue is vague on what it tells (potential) authors to do, I would be happy to provide more explicit directions.

Yes please. General good practice too. Plus if it's vague enough for me to misunderstand, then absolutely more explicit directions would be needed for beginners. If there's room for misinterpretation, there's a high likelihood of that happening in a "good first issue"

No need to worry about generics as well since the generic parameter will just be string.

Like this is implied in context, but may not be understood from just a quick reading of the GoDoc.

You also didn't give directions for comparing memory usage which is relevant here as well.