2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
103 stars 63 forks source link

African and SouthAmerican cluster (tech exploration and discussion). #2477

Closed damianavila closed 1 year ago

damianavila commented 1 year ago

Context

As part of our ECB-CZI collaboration, we need to deploy regional cluster in Africa and South America.

@jmunroe did a quick exploration of the options available in the major cloud providers:

We should stand up an African cluster and begin to create 'shared hubs' for these communities. Looking at the region maps for GCP/AWS/Azure, it doesn't appear Google (yet) has a physical presence but both Amazon and Microsoft have data centers in South Africa. I think it is up to 2i2c to propose the right cloud partner. For some African nations, it may actually be that the network is better to connect to either a European or South Asian data center, but I feel it is important to deploy to an African-based data center. I think we should consider not only what is technically simplest for us at 2i2c but consider what future partnerships with African-based communities and these major cloud providers may look like. All three providers have African 'strategies' in the works.

For Azure, Datacenter opened in 2019 https://www.microsoft.com/africa/ato https://azure.microsoft.com/en-ca/blog/microsoft-opens-first-datacenters-in-africa-with-general-availability-of-microsoft-azure/ https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/?regions=south-africa-north%2cnon-regional&products=all

For AWS, Datacenter opened 2022, af-south-1 https://aws.amazon.com/local/africa/cape-town/ https://aws.amazon.com/blogs/aws/now-open-aws-africa-cape-town-region/

For GCP, Lots of 'announcements' of money to be spent but I can't find any firm plans. https://nextbillionusers.withgoogle.com/events/google-4-africa-2022 https://blog.google/around-the-globe/google-africa/delivering-on-our-1b-commitment-in-africa/

Machine learning is an activity that the groups I have spoken with so far have expressed interest in, so we should try and go with a provider that offers GPUs

This same conversation needs to also happen for the South American cluster.

Additionally, a few thoughts I shared with James:

The technical aspect may be a driving force but I ack that it might be other ingredients, such as cloud providers programs in the region, that might deviate us from the optimal technical choice... but that is an another-level conversation, so let's decouple the discussion for now and just create the tech-focused issue.

So this is the first tech-focused part :wink:

Proposal

I invite @2i2c-org/engineering to provide feedback, previous experiences, thoughts, etc. about the feasibility, potential issues/problems, pros/cons of the idea of having these regional/local clusters for our future African and South American communities.

Updates and actions

No response

yuvipanda commented 1 year ago

we should stay away from Azure wherever possible. If GCP doesn't have africa data centers, that means AWS is our only option.

https://www.youtube.com/watch?v=R8iaViNIy3U

jmunroe commented 1 year ago

I'd like us to make a decision on where to deploy new clusters. We are getting closer to wanting to deploy hubs and having the decision on where to create the clusters is a requirement.

Proposal Deploy a new Kubernetes clusters in each of

  1. af-south-1 (AWS in Cape Town, South Africa
  2. southamerica-east1-c (GCP in São Paulo, Brazil)

Reasoning

  1. This project should utilize data centres physically located in Africa and South America.
  2. We have a preference for AWS and GCP over Azure for deploying and maintaining infrastructure.
  3. In Africa, AWS's af-south-1 is the only option and GCP does not have an African presence.
  4. We want to emphasize being cloud-agnostic so we should go with not AWS in South America.
  5. GCP's southamerica-east1-c (São Paulo, Brazil) provides the option of GPUs if required, has the larger selection of machine types, and is a Low CO2 facility. (Note AWS also has a data centre in São Paulo and GCP has another one in Chile)
damianavila commented 1 year ago

I would concur with James' proposal, @yuvipanda, WDYT?

yuvipanda commented 1 year ago

I will try to find out if we can get a sense of how big these data centers are, so we don’t run into the same problems we did with the london gcp data center. Otherwise sounds good to me

yuvipanda commented 1 year ago

Update: I haven't heard back anything unfortunately about the size of the DC, so let's just proceed.

damianavila commented 1 year ago

OK, we will proceed accordingly to the proposal outlined above (https://github.com/2i2c-org/infrastructure/issues/2477#issuecomment-1584771076) and see how things behave. Closing here for now. We can re-open if we need to further discuss/explore the topic.