Closed kbergin closed 5 years ago
➤ Chengchen Wang commented:
[~sehsan]
➤ Saman Ehsan commented:
As discussed in person, if Falcon “sniffs” the bundle that triggered each workflow and then adds the input hash label before starting it in Cromwell, this could lead to a race condition. Because there is a slight delay in applying labels to workflows in Cromwell, it is possible that Falcon thinks no workflows exist with a particular label hash when really the label hasn’t been applied yet.
➤ Saman Ehsan commented:
Spike doc is here: https://docs.google.com/document/d/1j6EfYq17fry6HHzJYu9oYXZSZHJRQAXs_e8xllqZ8c4/edit# ( https://docs.google.com/document/d/1j6EfYq17fry6HHzJYu9oYXZSZHJRQAXs_e8xllqZ8c4/edit#|smart-link )
➤ Saman Ehsan commented:
I did a comparison between different ways to get bundle metadata and uploaded the script I used here as a reference: https://github.com/HumanCellAtlas/pipeline-tools/blob/se_test_get_metadata/pipeline_tools/tests/shared/test_metadata_api.py ( https://github.com/HumanCellAtlas/pipeline-tools/blob/se_test_get_metadata/pipeline_tools/tests/shared/test_metadata_api.py|smart-link )
➤ Saman Ehsan commented:
PR is here: https://github.com/HumanCellAtlas/pipeline-tools/pull/162 ( https://github.com/HumanCellAtlas/pipeline-tools/pull/162|smart-link )
I’ll test this with sending notifications in our dev environment later today.
@jkaneria - I've added this to Ensure that simple field-level .. epic. Now that the spike is complete, is this a blocking issue or simply an implementation issue that's part of the epic which needs a Release and Milestone? Also - In Progress pipeline?
➤ Saman Ehsan commented:
From testing the pipeline-tools PR in dev, we observed that removing the data file checkout did not reduce the pre-processing time enough to prevent the request from the data store timing out. Additionally, the time varies depending on the number of metadata files in the bundle, which could increase over time and break this implementation.
So out of the options outlined in the document, we decided to move forward with adding a google pub/sub queue.
Discussed on the August 15 Refinement call: @jankeria to:
We implemented the no-op features in Lira and Falcon but it caused some problems around notifications and result in duplicate notifications/workflows and failing dcp integration tests.
In order to keep it working/fix it, there are a few steps we need to take:
┆Issue is synchronized with this Jira Story