elixir-cloud-aai / CrateGen

Compile GA4GH WES and TES requests from Workflow Run RO-Crates and package WES and TES runs as Workflow Run RO-Crates
Apache License 2.0
0 stars 1 forks source link

FEAT: add conversion functions for TES and WES to WRROC #5

Closed Karanjot786 closed 3 weeks ago

Karanjot786 commented 1 month ago

Description

This PR adds the implementation of the convert_tes_to_wrroc and convert_wes_to_wrroc functions. These functions are responsible for converting TES (Task Execution Service) and WES (Workflow Execution Service) tasks and runs into WRROC (Workflow Run RO-Crate) format.

Changes Included

Summary by Sourcery

This pull request adds conversion functions for transforming TES tasks and WES runs into WRROC format, along with helper functions for timestamp conversion and status mapping.

sourcery-ai[bot] commented 1 month ago

Reviewer's Guide by Sourcery

This pull request introduces two new functions, convert_tes_to_wrroc and convert_wes_to_wrroc, which convert TES (Task Execution Service) tasks and WES (Workflow Execution Service) runs into WRROC (Workflow Run RO-Crate) format. Additionally, helper functions convert_to_iso8601 and convert_status are implemented to assist with timestamp and status conversions, respectively.

File-Level Changes

Files Changes
src/converter.py Introduced functions for converting TES and WES data to WRROC format, along with necessary helper functions for timestamp and status conversions.

Tips - Trigger a new Sourcery review by commenting `@sourcery-ai review` on the pull request. - You can change your review settings at any time by accessing your [dashboard](https://sourcery.ai/dashboard): - Enable or disable the Sourcery-generated pull request summary or reviewer's guide; - Change the review language; - You can always [contact us](mailto:support@sourcery.ai) if you have any questions or feedback.
Karanjot786 commented 3 weeks ago

Hi @SalihuDickson ,

Thank you for the feedback. I'll verify the WES and TES formats against their standards and update the conversion functions accordingly. I'll also ensure to add error handling and support for other formats if needed. Regarding the WES output, I'll correct the object reference to avoid looking in the wrong place. Additionally, I'll review the mapping for CreateAction.instrument to ensure it correctly handles multiple executor images.

I'll make the necessary adjustments and resubmit the PR for review.

Karanjot786 commented 3 weeks ago

Hi @SalihuDickson ,

I have made the following updates based on your feedback:

  1. Verified and Corrected Format Handling:

    • Ensured that the WES and TES data formats are correctly handled according to their respective standards.
    • Added checks to ensure the executors list in TES data is not empty before accessing its first element.
  2. Added Error Handling:

    • Implemented error handling for missing keys, invalid formats, and other potential exceptions in both convert_tes_to_wrroc and convert_wes_to_wrroc functions.
    • Used the .get() method to safely access keys in inputs and outputs for both TES and WES data, preventing potential KeyError issues.
  3. Corrected Mapping for CreateAction.instrument:

    • Ensured that the CreateAction.instrument field correctly handles multiple executor images as needed.
  4. Enhanced Timestamp Conversion:

    • Added error handling in the convert_to_iso8601 function to manage different timestamp formats, assuming RFC 3339 format and converting to ISO 8601.

Please review the changes and let me know if there are any further adjustments required.

SalihuDickson commented 3 weeks ago

@Karanjot786, thank you for getting to this so quickly. However these corrections may be making this PR unnecessarily complex. So I think you should revert the most recent commit, so we can merge and work on these changes on the next one. I think you should also separate the WES and TES conversions into different PRs so we can add a little more specificity.

What I want you to consider for the next PR:

First I think we have 3 types of data when working with error handling, when you're dealing with user defined data,

  1. data that must be present and must be of a certain type (or range of types) if not, an error should be thrown
  2. data is optional but should be of a certain type (or range of types) if present or an error should be thrown
  3. And data that is recommended but not mandatory (could have a default value), this should show a warning

I think you need to figure out which applies to each field and write your error handling accordingly, so that we can avoid cryptic errors and gracefully handle these simple error scenarios. Something you can also do is collect all the errors in a list and then throw them all at once to ensure the user sees everything that is wrong with their data without having to rerun the program over and over.

We will also need to consider some best practices for error handling like having one function that all errors are passed into, to ensure uniform error handling and to constants for similar errors. This is quite a lot to consider, please feel free to ask me about anything you may be confused about : )

SalihuDickson commented 3 weeks ago

And you are currently looking for the output field in the run_logs but according to the WES specification, the output field isn't supposed to be in the run_logs. Please look at it here

Karanjot786 commented 3 weeks ago

Hi @SalihuDickson ,

Considering the feedback and to simplify the review process, may I cancel the current PR and create two separate PRs? One for convert_tes_to_wrroc and another for convert_wes_to_wrroc, each in different files. This approach will allow us to address any issues with more specificity and clarity.

Please let me know if this works for you.

SalihuDickson commented 3 weeks ago

Yes @Karanjot786, that is fine.

JaeAeich commented 3 weeks ago
  1. Make sure to properly set up the repo first, including packaging, dependency management, linting, static code/type checks, (API) documentation and a CI pipeline before adding any code.

Hey please consider linting and checks for non code files, since this this is a new proj we wouldn't end up with the prob I have with implementing it with TESK (ie huge changes). Consider looking at this PR, you don't necessarily need to implement pre-commit hook as that can hinder with DX, but the .pre-commit-config.yaml in that PR is has pretty good set of rules which you can put in CI.