JPL-MGHG / JPL-MGHG-SDS

This is a generic repo for tracking of SDS tasks
0 stars 0 forks source link

Run CARDAMOM in parallel on MAAP #10

Open ddalton-swe opened 7 months ago

ddalton-swe commented 7 months ago

Overview

This task aims to fully leverage the computational capacity of the MAAP DPS by saturating all 32 cores. The primary objective is to assess MAAP's performance during an extended run of CARDAMOM when all cores are actively engaged. Additionally, we plan to conduct a scalability test to gauge MAAP's ability to handle concurrent execution of multiple jobs, pushing the boundaries of its performance by saturating all cores simultaneously. This comprehensive evaluation will provide valuable insights into the system's efficiency and scalability under high-demand scenarios.

ddalton-swe commented 7 months ago

FINDINGS: There should only be 1 input file per job in MAAP. This allows for better reporting on job failures and handles the parallel runs of CARDAMOM using DPS. The jobs can then be run using a loop in Python as suggested in the MAAP documentation.

Additionally, the Data Processing System (DPS) serves as a platform for executing registered algorithms, as listed in the Algorithm Catalog, at scale in the cloud. The MAAP system offers a Jupyter GUI for running jobs, and alternatively, the maap.py library can be utilized to execute a batch of jobs sequentially in Python. Monitoring capabilities are integrated into DPS, with the MAAP system providing a Jupyter GUI for convenient job monitoring. Alternatively, maap.py in Python can also be employed for monitoring purposes.

Here is the code:

# Import the MAAP package
from maap.maap import MAAP

# Invoke the MAAP constructor using the maap_host argument
maap = MAAP(maap_host='api.maap-project.org')

input_files=[""]

for input_file in input_files:
    maap.submitJob(identifier="debug-run",
        algo_id="CARDAMOM",
        version="main",
        username="<yourUsername>",
        queue="maap-dps-worker-8gb",
        input_file=input_file)