danielgmyers / flux-swf-client

Apache License 2.0
10 stars 3 forks source link

Migration guide for 2.0.x -> 2.1. #102

Closed danielgmyers closed 1 year ago

danielgmyers commented 2 years ago

https://github.com/danielgmyers/flux-swf-client/issues/101

joeymarylander commented 2 years ago

I don't love that the Partition Id's are no longer what was in the Generator Result. Looking at workflows in SWF's UI suddenly becomes unusable given that things are just SHA256s now. I.e. if I have a Partitioned Workflow Step called Step1 with Partitions ["foo", "bar"], instead of my list of Activities being super readable as Step1_0_foo and Step1_0_bar, I get Step1_0_2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae and Step1_0_fcde2b2edba56bf408601fb721fe9b5c338d10ee429ea04fae5511b68fbf8fb9.

This is a terrible experience for someone who actively is browsing the UI for SWF.

Plus the same concept applies for manually signaling workflow steps. This is much more manageable since this is a rare thing to do, but still very annoying and non deterministic.

danielgmyers commented 2 years ago

For reference, this is the SWF Console's activity list (these are from Flux's integration tests):

swf-activities-list

And if you expand an activity, the first thing you see there (highlighted in yellow) is the control field, which contains the (unhashed, user-specified) partition id:

swf-activity-detail

danielgmyers commented 2 years ago

@joeymarylander Is that visible enough for your use cases? If not I can revisit the partition ID thing, maybe make it configurable? I know some users would find the looser constraints on partition IDs more convenient, but I can understand wanting the activity id to be recognizable.

joeymarylander commented 2 years ago

I think that that isn't good enough. Being able to diagnose specific problems at a glance during high severity problems is pretty important to me. Additionally, teaching new hires where to look and why there are random hashes in places isn't intuitive. I definitely see the advantages with having this sort of thing for certain usecases, but for me, they aren't all that useful.

Having this configurable would be a great balance for everyone, so my vote is to do that.

jordanrmerrick commented 2 years ago

I agree that this isn't visible enough, for the reasons @joeymarylander lists above. Additionally, this effectively stops users from being able to for activities based on ID. You can't filter based on control in the SWF console.

+1 for making this configurable.

danielgmyers commented 2 years ago

I filed https://github.com/danielgmyers/flux-swf-client/issues/103 for making this functionality configurable, to be included in 2.1.0.

danielgmyers commented 2 years ago

I filed https://github.com/danielgmyers/flux-swf-client/issues/104 for supporting a friendlier signal targeting mechanism.

danielgmyers commented 2 years ago

@joeymarylander @jordanrmerrick Thanks for your feedback! I've updated the pull request to reflect the following changes:

I've updated the migration guide as if those changes had already been made. That work is tracked in #103 but has been added to the 2.1 milestone; 2.1.0 won't be released without the change.

joeymarylander commented 2 years ago

Another alternative to hashing is to normalize the partition id, and convert all of those "bad" characters to an _. This doesn't solve the length limit on the Id, but it does help with allowing additional characters while maintaining readability.

danielgmyers commented 2 years ago

The main problem with targeted mutations is that any replacement character I choose risks collisions with other partition IDs.

As a simple example, let's say I have a partitioned step that does something with a list of file paths, and these two file paths are included in the list of partition IDs:

If "/" is an invalid character in partition IDs, and I replace them with "_", then I end up with two partitions that transform to the same value _path_to_my_file.txt for activity ID use, but that would mean only one of them will be executed.

More generally, remapping input values is only safe if we can do the transformation without losing information. To do that, we would need a set of replacement characters (68 of them, in our case) which are both a) valid for partition IDs, and b) guaranteed to not be present in the original input. Unfortunately we can't make any assumptions about the semantics of user-provided partition IDs.

While it's true that hashes can collide (by design), I think it's far more likely that mutating user-provided partition IDs (which are a limited set of meaningful values) will result in a collision than for the set of SHA-256 hashes of those same values to result in a collision.

danielgmyers commented 1 year ago

I went ahead and make the new partition id hashing behavior toggleable and disabled by default in #107. I'll consider this migration guide complete for the time being.