boto3 has been there for a while, and boto2 is deprecated and not updated anymore. It doesn't receive new features, and generally speaking credentials+auth management is better in boto3, notably with IMDSv2 support that @dauving will love.
How?
⚠️ builds on #421 which moves the "swf" module into "simpleflow.swf.mapper" ; I will rebase on #422, it's not done yet
SWF:
I moved endpoints one by one, adding a proxy method on ConnectedSWFObject
adding a proxy method is more or less mandatory if we want to maintain backwards compatibility on querysets in particular ; the boto3 methods are picky about receiving "None" values, so it also gives us a place where we can handle that in isolation, outside of the business logic
I tried to make as few changes as possible, so I didn't rework the logic or error handling at all EXCEPT for the format changes required by boto3
on the SWF side it's honestly easy as the methods map one by one with probably 0 change over the behaviour, except for return formats that translate dates in boto3
S3:
things are not as easy there, as boto3 introduces a resource based approach and removes quite a few handy boto2 helpers
I tried my best but I think it needs a pair of eyes who knows how boto3+S3 works 😅
I did NOT try to optimise for performance, as I don't think it's a big concern (especially "simpleflow.storage.list_keys()" could probably be better)
Tests
tests are passing on each commit, and I added a couple ones passing by
local tests with "simpleflow standalone", which behaves correctly
sporadic local tests in ipython for querytests and some failure modes, so I could extract realistic error messages
Known issues and attention points
(minor) timestamp integers becoming datetimes => I think this one will not backfire too much, but we should keep it in mind
I'm honestly blind on the S3 part and I don't use Metrology or Steps at Alan right now => would be a great idea to test in the botify context where you have all this and can tell if it works or not
We should carefully test region issues => I don't doubt boto3 will find credentials in standard locations (better than boto2 actually) but the AWS_DEFAULT_REGION, SIMPLEFLOW_S3_HOST and SIMPLEFLOW_S3_SSE should be tested carefully (maybe it's a good idea to actually remove SIMPLEFLOW_S3_HOST btw)
(😱) I removed a "disable_boto_connection_pooling" method that was supposed to guard against boto internal pooling not surviving forks ; this thing disappeared completely from boto3, and users are supposed to ensure themselves that a different "session" is started when forking
Review
This PR could/should be reviewed commit by commit
Ideally I'd like to have 2 pair of eyes on it before merging, even if the review is not exhaustive (just tell me what you focused on)
No need to add the same remark over and over, once is enough
Misc: I wanted to make stacked PRs as this is what I tend to do at $job today (with graphite) but it's only possible for maintainers and doesn't work from a fork.
Rollout
Not sure what we should do:
I like the idea of an alpha release and fixing things until we can generate a stable release
BUT we can also split things and release more progressively: we could issue 3 releases: administrative SWF endpoints, poll/respond SWF endpoints, and S3
Why?
See https://github.com/botify-labs/simpleflow/issues/419
boto3 has been there for a while, and boto2 is deprecated and not updated anymore. It doesn't receive new features, and generally speaking credentials+auth management is better in boto3, notably with IMDSv2 support that @dauving will love.
How?
Tests
Known issues and attention points
Review
Misc: I wanted to make stacked PRs as this is what I tend to do at $job today (with graphite) but it's only possible for maintainers and doesn't work from a fork.
Rollout
Not sure what we should do: