EOEPCA / deployment-guide

EOEPCA Deployment Guide
https://deployment-guide.docs.eoepca.org/
Apache License 2.0
3 stars 7 forks source link

Investigate possible use of OpenEBS #23

Open rconway opened 6 days ago

rconway commented 6 days ago

OpenEBS - https://openebs.io/

Investigate whether use of OpenEBS can be suggested in the Deployment Guide as a possible approach to establishing ReadWriteMany storage on any cluster.

This could support the EOEPCA Prerequisites as requested by #21.

jdries commented 5 days ago

We would appreciate this information very much. Also some explanation on operational cost and maintenance would be helpful, to allow us to assess how realistic it is to support rwx volumes via such a component. I also understand that this is used for performance, to avoid moving data across the network, but am wondering how this will work with a big cluster? If I write data on one node, does it somehow stay there until another node tries to read it, or how does this high-performance kind of replication work?

spinto commented 5 days ago

hi @jdries , that's a good point about having more info about production (ops&maintenance) and I think this is not only about RWX volumes but for all the EOEPCA pre-requisites (e.g., even the K8S cluster itself). I have put a note in #21 to add more info on what is recommended for demo/test and what for production.

About the other points, I cannot speak about OpenEBS, I have never used it, but using IBM Spectrum Scale, LUSTRE or GlusterFS is very realistic to support ReadWriteMany volumes in K8S. This is used in production in many HPC centers (CERN) and I have myself used both GlusterFS and IBM Spectrum Scale in EUMETSAT for a big 2000 CPU-cores K8S cluster for multi-mission bulk data processing.

About how does it works, for most the backend is a distributed file system, which stores the data in multiple replicas in storage nodes "as close as possible" to the computing nodes and use multi-tier "intelligent" caching (based on read/write statistics on fs files and folders) to improve overall performance/stability. Over-simplifying, and here I hope no one expert in HTC/HPC reads this otherwise they will kill me :), if you write something in a folder from node A, some bits may go on node A and in parallel (and in the background) go also to nodes B and C, then when you read from D you will read in parallel from A B and C , but caching also on D. At some point if you always write on A and read from D, D will become the reference, plus your future writes from A will go directly to D and the cached bits on B and C will be deleted.