Closed tloubrieu-jpl closed 1 year ago
airflow has a feature called excom to share data between steps. @ramesh-maddegoda is investigating that.
Evaluated the following options to share data between Airflow tasks:
Pros:
Cons:
Pros:
Cons:
Pros: Higher performance compared to S3. Other features of databases such as transaction management, queries and security can be easily utilized.
Cons:
Pros:
Cons:
Considering the above pros and cons of each option, we can make following recommendations:
Additional note on Cumulus:
Had a chat with a Cumulus user and got to know that they use S3 buckets to share data between tasks in a workflow. At the end of each task, the data is uploaded to an S# bucket and then the next task downloads data from the same S3 bucket.
Agreed during the breakout to move forward with the S3fs solution.
💡 Description
Options are EFS, S3, other databases
That could managed as:
A good place to start is to get inputs from PODAAC on how they do that in Cumulus.