This repository will contain all relevant information related to using parallel and distributed data analysis on the Poseidon cluster at the Woods Hole Oceanographic Institution (WHOI). This repository will specifically go over Dask, xarray and requesting resources on Poseidon and your local machine. Below is a README that I asked ChatGPT to generate for me, we can change it later but I was lazy.
Welcome to the repository for all relevant information related to using parallel and distributed computing on the Poseidon cluster at the Woods Hole Oceanographic Institution (WHOI). This repository will specifically cover the following topics:
This repository contains comprehensive information on utilizing parallel and distributed computing resources on the Poseidon cluster at WHOI. The primary focus is on using Dask for parallel computing, leveraging xarray for managing multi-dimensional arrays, and efficiently requesting resources on both the Poseidon cluster and your local machine.
Before you begin, ensure you have met the following requirements:
git clone https://github.com/anthony-meza/WHOI-PO-HPC.git
cd WHOI-PO-HPC
pip
pip install -r requirements.txt
or conda
(preferred method)
conda env create -f environment.yml
Dask is a flexible parallel computing library for analytics. It helps scale the Python ecosystem (numpy, pandas, scikit-learn, etc.) and enables execution on multi-core machines and distributed clusters.
xarray is an open-source project and Python package that makes working with labeled multi-dimensional arrays simple, efficient, and fun!
Learn how to request and manage computing resources on the Poseidon cluster effectively:
To ensure your local machine can interact with Poseidon:
Contributions are welcome! Please read the contributing guide to get started.
This project is licensed under the MIT License - see the LICENSE file for details.
For any questions or suggestions, please open an issue or contact the repository maintainer at your-email@whoi.edu.