Clinical-Genomics / cg

Glue between Clinical Genomics apps
8 stars 2 forks source link

`cg deliver ticket` based on `deliver` tag in Housekeeper #2848

Open fevac opened 8 months ago

fevac commented 8 months ago

Description

Suggested solution

This can be closed when

Describe what needs to be done for this issue to be closed

Blocked by

If there are any blocking issues/prs/things in this or other repos. Please link to them.

Clarification

The cg deliver code works for delivering the files from the most recent analyses. For older analyses where the tags and files might be different and thus not delivered properly.

Acceptance criteria

karlnyr commented 8 months ago

YES! YES! YES! There is only one deficit to this, and that is cram files. Assuming that at the time when we created the bundle all files were present then this should work. However, we would not know if we were missing cram/bam - which could perhaps be fixed with an error and some kind of force flag. We could for instance ask the customer if they are okay with skipping those files. I am uncertain if the deliver tag has been used to tag things this way forever, but either way, I think it should be for today and onwards - it would increase backwards compatibility by miles.

Great suggestion Eva!

fevac commented 8 months ago

we could also make sure that all files that need delivery get the delivery tag. So all cram files should have it in the future

karlnyr commented 8 months ago

So, this is the systemd service that removes cram:

[Service]
Type=oneshot
ExecStart=/bin/bash -c "/home/proj/production/bin/miniconda3/envs/P_cg/bin/cg \
    --config /home/proj/production/servers/config/hasta.scilifelab.se/cg.yaml \
    clean \
    scout-finished-cases \
    -y \
    --days-old 300"
ExecStartPost=/bin/bash -c "systemctl --user start send-success-slack@%n.service"

Which does this:

https://github.com/Clinical-Genomics/cg/blob/9845e773e5114ada88e6b6b85462b6b199035639/cg/cli/clean.py#L108-L135

Vince-janv commented 5 days ago

Technical refinement

ahdamin commented 15 hours ago

I used the following query to check which pipelines use the deliver and delivey-report:

SELECT
  tag.name AS tag_name,
  order.workflow AS order_workflow,
  COUNT(DISTINCT file.id) AS file_count
FROM
  `housekeeper-stage`.bundle
INNER JOIN
  `housekeeper-stage`.version ON bundle.id = version.bundle_id
INNER JOIN
  `housekeeper-stage`.file ON version.id = file.version_id
INNER JOIN
  `housekeeper-stage`.file_tag_link ON file.id = file_tag_link.file_id
INNER JOIN
  `housekeeper-stage`.tag ON tag.id = file_tag_link.tag_id
LEFT JOIN
  `cg-stage`.sample ON bundle.name = sample.internal_id
LEFT JOIN
  `cg-stage`.case ON bundle.name = case.internal_id
LEFT JOIN
  `cg-stage`.case_sample ON case_sample.case_id = case.id AND case_sample.sample_id = sample.id
LEFT JOIN
  `cg-stage`.order_case ON order_case.case_id = case.id
LEFT JOIN
  `cg-stage`.order ON order.id = order_case.order_id
LEFT JOIN
  `cg-stage`.analysis ON analysis.case_id = case.id
WHERE
  (tag.name="deliver" OR tag.name="delivery-report")
GROUP BY
  tag.name,
  order.workflow
ORDER BY
  tag_name DESC,
  file_count DESC;

If this query is correct 😄 the chart shows that balsamic and mutant workflows are the top users of these tags (from the database on the stage server) Image

From a high level examination of the code, apparently, balsamic, rnafusion, and mip-rna utilize the delivery-report tag in their configuration builders