catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
468 stars 107 forks source link

Create GPU instance to run Panda #1502

Closed katie-lamb closed 2 years ago

katie-lamb commented 2 years ago

It seems like we'll need to create our own GPU instance to run Panda with for entity matching EIA and FERC.

Should we use AWS? GCP?

Grant money should cover this.

zaneselvans commented 2 years ago

So far we've done all of our ☁️ stuff on GCP. Can we get some more information from Xu and Andrew about how it needs to be set up? Do they have a Docker container that we'll run? How does Panda access GPU resources?

TrentonBush commented 2 years ago

FYI I saw this thread about how to choose a GPU instance the other day. The summary is that new GPUs, while higher $/hr, offer better performance that usually make them cheaper overall. This guy recommends T4 for inference and A100 for training of DL models. Not sure what the details of CCAI are so might have to try each to see which is best.

katie-lamb commented 2 years ago

I'll follow up with Andrew and Xu about this. They should have good insight about what level of GPU we generally need for Panda.

In my experience I've always just gone with whatever platform I'm most comfortable with. AWS probably has the best UI but you can accidentally leave it running which is unfortunate. I'll poke around and see if any of them do grants for climate projects. When I was in school we sometimes could get pretty big credits for GPUs but I'm not sure if that was a grant or if professors just know people.

zaneselvans commented 2 years ago

My experience so far has been that managing cloud resources and learning the details of the different platforms is kind of a nightmare, and we're going to need to get all these parts working together in the long run, so I'm reluctant to end up with stuff on more than one platform if at all possible. GCP also definitely has research / non-profit / climate credit programs. But we're also technically for profit, and I suspect the overall cost will be very low given our resource needs.