big-data-lab-team / reading-club

Notes on papers discussed in the lab's reading club
GNU General Public License v3.0
0 stars 0 forks source link

Paper: Cloud infrastructure provenance collection and management to reproduce scientific workflows execution #4

Closed ali4006 closed 6 years ago

ali4006 commented 6 years ago

URL: https://www.sciencedirect.com/science/article/pii/S0167739X17314917

Intro

Workflow execution scenario on the cloud

ReCAP architecture

A plugin-based mechanism that include:

Job-to-cloud resource mapping

Resource mapping approaches provide two pieces of information, hardware and software configurations.

Most of the WMS maintain either a unique IP or name information to access the provenance information. Two different resource usage scenarios on the cloud:

Static approach: fig. 9 shows the static mapping between a list of jobs of a given workflow (from Pegasus database) and a list of VMs in the cloud. The mapping is established by matching the IP addresses, that’s why the mapping is not possible for dynamic environments since resources will not be available after running.

Eager approach: establish job-to-cloud mapping in two phase: 1) Temporary mapping between the job and cloud resource is established since jobs are still running. 2) Final job-to-cloud resource mapping through retrieving job information from the workflow provenance captured by the WMS.

Lazy approach: eager approach relies on Condor and to overcome these dependencies Lazy approach is devised for dynamic environments. This algorithm does not maintain a temporary relation between a job and the virtual machine, but it periodically monitors the current status of the available VMs running on the cloud infrastructure and retrieves their Meta data information.

Result

In order to evaluate the proposed work, three different workflows – Montage, ReconAll, Wordcount - were executed using ReCAP and their captured provenance were found to be consistent (table 3-6).

Questions:

Note: I think capturing the resource configurations is necessary but not sufficient in case of reproducibility evaluation.