Clinical-Genomics / cg

Glue between Clinical Genomics apps
8 stars 2 forks source link

New Case Action Top-up #3971

Open RasmusBurge-CG opened 4 days ago

RasmusBurge-CG commented 4 days ago

As a member of prod bioinfo, I want a new action on case called Top-up, So that analysis would start automatically when new data is available for cases that have been topped up.

Clarification

It’s very easy to overlook manually starting cases after a top-up, which has the potential to cause significant delays in processing samples (potentially devastating for critically ill patients). To prevent this and avoid unnecessary sample processing, one could set the case status to Top-up. This way, when new data becomes available for that case, the automation will pick up the sample and start it. Additionally, there is potential for integration with LIMS: when a sample is requeued for top-up, it could automatically update the status in StatusDB.

Work impact

Answer the following questions:

Acceptance Criteria

Notes

diitaz93 commented 4 days ago

Second clarification

There are two situations (that I know) in which a sample will be sent for top-up:

  1. The sample didn't reach the target amount of reads (based on the app tag). In this case, the automation checks that the reads are not reached and simply skips the analysis until the sample has reached the desired amount of reads.
  2. The sample got the required amount of reads and it starts the analysis, but during the analysis we realise that it didn't get the expected coverage (or didn't satisfy other pipeline-specific requirements related to sequencing quality). Then it is sent to top-up and the sample/analysis is labelled as analysed/complete (unclear how it works on this level).

The problem described in this issue concerns only the second situation, as in the first one, the sample will be taken by the automation when the desired number of reads is reached. In the second case, the automation will never start again the analysis as it is labelled as already run.

henrikstranneheim commented 4 days ago

Will the automation not already pick this up whem there is new seq data for a sample which is newer than the latest analysis and action is None?


    def _is_latest_analysis_done_on_all_sequences(self, case: Case) -> bool:
        return case.latest_analyzed < case.latest_sequenced
beatrizsavinhas commented 3 days ago

Yes, I was thinking exactly of this @henrikstranneheim! The only exception I can think of is if the initial analysis fails, and the case was not stored and therefore still has status running in status DB - so the automation won't start it. I think this might be the case with samples that received enough reads but the analysis failed on coverage and was left as failed in trailblazer.

henrikstranneheim commented 3 days ago

But should that case not warrant a manual investigation? It might fail for number of reasons.

RasmusBurge-CG commented 3 days ago

Hi!

Yes, @beatrizsavinhas, this would fall under case 2, as stated in the second clarification by @diitaz93, right?

@henrikstranneheim, that’s true. If the investigation reveals a need for a top-up, then the Top-up action could be useful. Do you agree?

There might be a better solution to this. If there’s something I’m unaware of, I’m happy to learn!

henrikstranneheim commented 2 days ago

Hi!

Yes, @beatrizsavinhas, this would fall under case 2, as stated in the second clarification by @diitaz93, right?

@henrikstranneheim, that’s true. If the investigation reveals a need for a top-up, then the Top-up action could be useful. Do you agree?

Not sure, it would not have any function in cg and it would complicate the logic that finds cases to start, (which is already complicated). Unsure of that it is worth it compare to just a comment in TB and let the automation pick it up once ready.

RasmusBurge-CG commented 2 days ago

Okay, I see @henrikstranneheim. It’s just that the automation doesn’t pick it up for pipelines like mip-dna. I do agree that it would affect the start logic and require changes in CG. The benefit, as I see it, would be that a sample could be started during the nighttime, reducing turnaround time. Additionally, in my view, it would lessen the risk of missing a case restart.

karlnyr commented 2 days ago

It would be vital that we restart cases whenever new data is available. @Karl-Svard brought up that if we re-demultiplex we would not want to start things because of the new "latest_sequenced_at" date. We will have to refine the acceptance criteria at a later meeting.

beatrizsavinhas commented 2 days ago

Suggestion for acceptance criteria:

Considerations:

henrikstranneheim commented 1 day ago

Hhmm, tricky one. @RasmusBurge-CG Do you have an example of a case that is not picked up and do you have any reason why.

@beatrizsavinhas Can't you set the case action to "analyze"?

beatrizsavinhas commented 1 day ago

I don't think that would work as intended @henrikstranneheim because if the case is set to analyse the analysis will just start again when the systemd runs, with no regard to there being new sequencing data or not. So we run the risk of analysing the case again with the same data. What we need is something that makes it so the analysis only starts after the sample was topped up.

But as @karlnyr mentioned, we will still discuss this issue in our next meeting!

islean commented 20 hours ago

I think setting the case action to None will result in the behaviour you are after:

  1. ANALYZE will result in it getting picked up for analysis regardless of whether there is new sequencing data or not
  2. RUNNING will result in the case never being picked up
  3. None will result in a check if there is new sequencing data since the latest analysis and if so start it
karlnyr commented 19 hours ago

That would restart the case instantly. Since there is no analysis object in statusdb (only created when we have a completed case), and there is sequence data. What we want, is to queue the analysis for start after it has been sequenced again. Meaning, that there is not analysis object in statusdb, there is sequencing data, but what we need is to check if there is new sequencing data that has been added to the sample. This could be done in a lot of ways- perhaps checking if there is an analysis already in trailblazer that is failed - if so, it means that we have once started the case but something was up and then check if the latest sequencing date is after the start of that trailblazer analysis. Or, you could check if more than one flow cell exists on the sample. OR we could start creating analysis objects upon start, but that would require some changes to existing logic, and perhaps some new fields 👍 SO many choices

edit:

something I though of is that we could in fact make lims set the case for the sample to analyze if it gets enough reads in sequence aggregation. Or even do that in the post-processing of demux