Clinical-Genomics / trailblazer

Keep track of and manage analyses
MIT License
5 stars 2 forks source link

(Streamline delivery) Use latest analysis per case #417

Closed islean closed 3 months ago

islean commented 3 months ago

Description

When counting analysis statuses within an order, we should only use the latest analysis within each case. This PR adds a filter which makes sure that only the analysis with the latest started_at per case is used.

Added

Fixed

How to prepare for test

How to test

Expected test outcome

Review

This version is a

islean commented 3 months ago

Hmmm, seems not to work currently:

image
seallard commented 3 months ago

If it works it works! But I'd prefer a different structure of the logic to make it easier to understand and modify (if possible). Might be worth making it less general/efficient to gain some readability

islean commented 3 months ago

If it works it works! But I'd prefer a different structure of the logic to make it easier to understand and modify (if possible). Might be worth making it less general/efficient to gain some readability

Do you prefer fetching all analyses for the order_id and then handling it in Python? Given that the number of analyses in an order won't be too many, that would probably be feasible, but then we would need to fetch all analyses in the order, sort them on case and then loop through each case-analyses bundle and keep only one from each.

islean commented 3 months ago

Would ideally want some more feedback on whether the design is good or bad here

islean commented 3 months ago

Seems to function as expected now. Most mismatches come from RSYNC analyses also having the same order_id so we need to get rid of those too. But that is out of scope.

sonarcloud[bot] commented 3 months ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

seallard commented 3 months ago

If it works it works! But I'd prefer a different structure of the logic to make it easier to understand and modify (if possible). Might be worth making it less general/efficient to gain some readability

Do you prefer fetching all analyses for the order_id and then handling it in Python? Given that the number of analyses in an order won't be too many, that would probably be feasible, but then we would need to fetch all analyses in the order, sort them on case and then loop through each case-analyses bundle and keep only one from each.

I'm not sure! Maybe we can measure how much faster this is and make a judgement call accordingly? But I'm fine with this to be honest 👍

islean commented 3 months ago

If it works it works! But I'd prefer a different structure of the logic to make it easier to understand and modify (if possible). Might be worth making it less general/efficient to gain some readability

Do you prefer fetching all analyses for the order_id and then handling it in Python? Given that the number of analyses in an order won't be too many, that would probably be feasible, but then we would need to fetch all analyses in the order, sort them on case and then loop through each case-analyses bundle and keep only one from each.

I'm not sure! Maybe we can measure how much faster this is and make a judgement call accordingly? But I'm fine with this to be honest 👍

Oh, in this case I do not think it is a matter of performance. The bottleneck seem to be cg either way and looping through a list of at most 50 cases twice should be very fast. I just feel that that option is clunkier. And if we are hoping to move more logic to in other instances I do think we will see more of this since subqueries are an integral part of SQL.