Closed trevorgerhardt closed 5 years ago
@ansoncfit definitely went light on the comments expecting things to change drastically.
Ideally we'd share an instance of this type of class across all libraries so that files are stored and retrieved in the same way, using the same S3 client. So eliminate and replace what is currently in R5.
Are we monitoring buckets well enough right now for that part to matter? A single bucket setup would have top level directories within it separating data like opportunities and networks, so I'm not sure if that matters as much. It would be easier to abstract the persistence if we used a single bucket.
Replying to @trevorgerhardt's response to @ansoncfit's review:
Agreed that we should put this storage mechanism somewhere that would allow a single instance to be used. It hasn't been around long enough to merit maintaining another separate project. R5 needs to use it when it's running as a worker, separately from analysis-backend. So I don't think this can be in analysis-backend, it needs to be in R5.
I talked to Anson about this the other day. We thought it could be useful to have a single configurable bucket as long as all resources are split into several sub-buckets with predefined names. You mention that "a single bucket setup would have top level directories" so it sounds like you intended the same thing. Seems like we all agree on this but that wasn't yet stated. The main problem is migration - it's bound to cause at least some short term mess to change the bucket hierarchy.
One more observation: When switching to a single top-level bucket, we could use a CATEGORY enum to force everything to be in standard locations (rather than open-ended sub-bucket strings).
I'm thinking we can't resolve all the issues here in a single pull request. We should probably copy any relevant observations in the comments and PR reviews over to #85 to track long term progress on this issue.
Was just noticing that R5 also uses S3 storage (fetching scenarios and networks) and would ideally share components to handle this. We may want to (eventually) move these components into R5 since analysis-backend imports R5 as a dependency.
Initial implementation of consolidating our S3/file storage usage. Please give feedback on naming and implementation. I wanted to do a first pass that changed the least amount of code possible that could be updated later. Ideally we'd have an implementation we could initialize and pass between our of our libraries so that it's used the same way.
A few notes: