(Streamline delivery) Use latest analysis per case

islean commented 3 months ago

Description

When counting analysis statuses within an order, we should only use the latest analysis within each case. This PR adds a filter which makes sure that only the analysis with the latest started_at per case is used.

Added

Filter by latest_per_case

Fixed

Order summaries only use the latest analysis in each case.

How to prepare for test

[ ] ssh to hasta (depending on type of change)
[ ] activate stage: us
[ ] request trailblazer-stage on hasta: paxa

[ ] install on stage:

bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_trailblazer -t trailblazer -b [THIS-BRANCH-NAME] -a

[ ] ssh to clinical-db (depending on type of change)
[ ] install on stage: bash /home/proj/production/servers/resources/clinical-db.scilifelab.se/update-trailblazer-ui-stage.sh [THIS-BRANCH-NAME]

How to test

[ ] login to ...
[ ] do ...

Expected test outcome

[ ] check that ...
[ ] Take a screenshot and attach or copy/paste the output.

Review

[ ] tests executed by
[ ] "Merge and deploy" approved by Thanks for filling in who performed the code review and the test!

This version is a

[ ] MAJOR - when you make incompatible API changes
[ ] MINOR - when you add functionality in a backwards compatible manner
[ ] PATCH - when you make backwards compatible bug fixes or documentation/instructions

islean commented 3 months ago

Hmmm, seems not to work currently:

seallard commented 3 months ago

If it works it works! But I'd prefer a different structure of the logic to make it easier to understand and modify (if possible). Might be worth making it less general/efficient to gain some readability

islean commented 3 months ago

If it works it works! But I'd prefer a different structure of the logic to make it easier to understand and modify (if possible). Might be worth making it less general/efficient to gain some readability

Do you prefer fetching all analyses for the order_id and then handling it in Python? Given that the number of analyses in an order won't be too many, that would probably be feasible, but then we would need to fetch all analyses in the order, sort them on case and then loop through each case-analyses bundle and keep only one from each.

islean commented 3 months ago

Would ideally want some more feedback on whether the design is good or bad here

islean commented 3 months ago

Seems to function as expected now. Most mismatches come from RSYNC analyses also having the same order_id so we need to get rid of those too. But that is out of scope.

sonarcloud[bot] commented 3 months ago

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

seallard commented 3 months ago

If it works it works! But I'd prefer a different structure of the logic to make it easier to understand and modify (if possible). Might be worth making it less general/efficient to gain some readability

Do you prefer fetching all analyses for the order_id and then handling it in Python? Given that the number of analyses in an order won't be too many, that would probably be feasible, but then we would need to fetch all analyses in the order, sort them on case and then loop through each case-analyses bundle and keep only one from each.

I'm not sure! Maybe we can measure how much faster this is and make a judgement call accordingly? But I'm fine with this to be honest 👍

islean commented 3 months ago

If it works it works! But I'd prefer a different structure of the logic to make it easier to understand and modify (if possible). Might be worth making it less general/efficient to gain some readability

Do you prefer fetching all analyses for the order_id and then handling it in Python? Given that the number of analyses in an order won't be too many, that would probably be feasible, but then we would need to fetch all analyses in the order, sort them on case and then loop through each case-analyses bundle and keep only one from each.

I'm not sure! Maybe we can measure how much faster this is and make a judgement call accordingly? But I'm fine with this to be honest 👍

Oh, in this case I do not think it is a matter of performance. The bottleneck seem to be cg either way and looping through a list of at most 50 cases twice should be very fast. I just feel that that option is clunkier. And if we are hoping to move more logic to in other instances I do think we will see more of this since subqueries are an integral part of SQL.

Clinical-Genomics / trailblazer