Clinical-Genomics / event-driven-architecture

Project tracking for event driven POC
0 stars 0 forks source link

Think about possible events #6

Closed seallard closed 1 month ago

seallard commented 1 month ago

Think about the following questions and prepare brief answers for a presentation and discussion.

seallard commented 1 month ago

Don't forget this one @diitaz93, @ChrOertlin, @islean, @Vince-janv

seallard commented 1 month ago

Possible state transition 1: start analyses after post processing

When post processing completes for a flow cell, new data becomes available which potentially can be used to start analyses.

From: post processing completed for flowcell. To: analyses started for cases.

For example, the MIP-DNA workflow. Current trigger: crontab which runs workflow-mip-dna-start.sh (cg workflow mip-dna start-available) Frequency: every hour.

Microbial analyses are started every 6th hour, so that could be sped up a lot.

Possible topic names

Possible event names

Event content

The flow cell ID.

How does the message impact turn-around-time, resilience and decoupling?

Up to 1 hour reduction for the state transition and TAT. No real decoupling. Resilience: we can have retries in the consumer trying to start the analyses. For example if a database, slurm or something else goes down temporarily for some reason.

islean commented 1 month ago

Possible state transition 1: start analyses after post processing

When post processing completes for a flow cell, new data becomes available which potentially can be used to start analyses.

From: post processing completed for flowcell. To: analyses started for cases.

For example, the MIP-DNA workflow. Current trigger: crontab which runs workflow-mip-dna-start.sh (cg workflow mip-dna start-available) Frequency: every hour.

Possible topic names

* `FlowcellPostProcessing`

* `DemultiplexingPostProcessing`

* `DemuxPostProcessing`

Possible event names

* `PostProcessingCompleted`

Event content

The flow cell ID.

How does the message impact turn-around-time, resilience and decoupling?

Up to 1 hour reduction for the state transition and TAT. No real decoupling. Resilience: we can have retries in the consumer trying to start the analyses. For example if a database, slurm or something else goes down temporarily for some reason.

Are cases always run on just one flow cell?

ChrOertlin commented 1 month ago

Description

Maybe a nice place to start might be to make the reading of workflow qc data into arnold event driven. Currently, this project is on hold since we have some re-working in trailblazer to do. However, it might be nice to create a producer for finished workflows and let cg consume to this to then trigger parsing of the qc data.

Benefits

  1. Seemingly straight forward
  2. Immediate benefits to the organisation

Proposed flow

Topic

Name: 'AnalysisRun'

Event

Name: 'AnalysisCompleted' Content: 'case_internal_id'

How does this improve TAT, resilience and decoupling?

As I see it now it will resolve an immediate issue where this flow would be coupled to the trailblazer flow, however once a slurm job is finished successfully it could produce the event. It does not affect TAT but can offer other tangible benefits in storing data with valuable information for business logic.

islean commented 1 month ago

Possible state transition 2: Upload completed cases

When a case has been run, the analysis is to be uploaded. This could be done by setting up a listener on the analysis table in cg with a completed_at trigger

From: The completed_at has been set in the Analysis table To: The upload command is run

Example: Tomte Current trigger: systemd which runs cg upload auto --workflow tomte Frequency: every tenth minute.

Possible topic names

Possible event names

Event content

Case internal id

How does the message impact turn-around-time, resilience and decoupling?

Up to 10 minute reduction for the state transition and TAT. No real decoupling. Resilience: Might go down a bit. There are some more filters that need to be considered. I do not know if we need to set up more triggers to cover events where completed analyses are not to be uploaded.

henrikstranneheim commented 1 month ago

Do we have to use a listener? It would be nice to publish the event from the upstream process.

seallard commented 1 month ago

Do we have to use a listener? It would be nice to publish the event from the upstream process.

No, we don't have to listen to database tables. We will explicitly publish events from the upstream process. There are many reasons as to why integrating kafka directly against our databases is a bad idea at this stage in our organization.

Mainly, I think it is a bad pattern because we have one monolith with one giant database that a lot of code writes to. What if other components write to the column? What if a one time script is run writing data?

It is much better if we are explicit about when an event is supposed to be published instead of introducing another layer of indirection.

seallard commented 1 month ago

Are cases always run on just one flow cell?

Nope! So the listener would still identify which cases are ready to be started.

islean commented 1 month ago

Possibility with store available

Short rundown. We could set up a publisher in trailblazer which posts an event whenever an analysis is completed containing the case's internal id. Then we set up a consumer in cg which triggers the store functionality. This would add some decoupling between cg and trailblazer.

diitaz93 commented 1 month ago

Possible state transition 1: start demultiplexing

When flow cells are transferred from the NAS or from PDC.

From: Copy complete flow cell. To: start demultiplexing.

Current trigger: crontab which runs start-demux.sh (cg demultiplex all) Frequency: every 10 min.

Possible topic names

Possible event names

Event content

The flow cell ID/ full name.

How does the message impact turn-around-time, resilience and decoupling?

Up to 10 minutes reduction for the state transition and TAT. Some buffer time in the DRAGEN (not start demultiplexing all FC at the same time Avoid flow cells being taken twice for demultiplexing by the automation (maybe?)

seallard commented 1 month ago
Here is a summary of all our crontab timers Frequency Description
Every Saturday at 02:05 Microbial DB Backup
Daily at 00:00 Upload processed cases to mutacc database
Daily at 03:10 AWS DB Backup
Daily at 20:00 Process solved cases with mutacc
Daily at 21:45 Count Housekeeper files
Every 6th day-of-week at 20:00 Delete files and empty dirs in customer inbox
Every 8th hour Store completed microbial analyses in Housekeeper
Every 6th hour Start all new microbial analyses
Every 4th hour Check for received samples
Check for prepared samples
Check for delivered samples
Check for delivered pools
Check for received pools
Every 3rd hour Upload results for MIP-DNA
Upload results for BALSAMIC
Upload results for BALSAMIC-UMI
Every 2nd hour Store completed MIP-DNA analyses in Housekeeper
Every hour Start analyses for BALSAMIC
Store BALSAMIC analyses in Housekeeper
Fetch ONE requested flowcell from PDC
Start analyses for MIP-DNA
Store completed MIP-RNA analyses
Start analyses for MIP-RNA
Start available analyses for mutant
Store available analyses for mutant
Every 10 minutes Create Novaseq demux sample sheet
Start demultiplexing of all flow cells
Start available analyses for Fluffy
Store available analyses for Fluffy
Every 5 minutes Scan for analyses
seallard commented 1 month ago
Our system d timers Frequency Description
Every 10 minutes cg-archive-update-job-statuses.service
Every 10 minutes cg-demultiplex-finish-all.service
Every 10 minutes cg-demultiplex-create-illumina-manifest-files.service
Every 10 minutes cg-demultiplex-create-nanopore-manifest-files.service
Every 10 minutes cg-demultiplex-confirm-flow-cell-sync.service
Every 10 minutes cg-demultiplex-copy-completed-flow-cell.service
Every 3rd hour cg-upload-mip-rna.service
Every 3rd hour cg-upload-microsalt.service
Daily at 00:00 sql-backup-remove.service
Daily at 00:00 cg-compress-fastq.service
Daily at 00:00 mongo-backup.service
Daily at 00:00 mongo-backup-remove.service
Daily at 00:00 sql-clean-binlog.service
Daily at 00:00 sql-backup.service
Daily at 00:00 cg-clean-analysis-balsamic-qc.service
Daily at 00:00 cg-clean-analysis-balsamic-umi.service
Daily at 00:00 cg-clean-analysis-microsalt.service
Daily at 00:00 cg-clean-analysis-mip-rna.service
Daily at 00:00 cg-clean-analysis-mutant.service
Daily at 00:00 cg-clean-rsync-dirs.service
Daily at 00:00 cg-clean-analysis-balsamic.service
Daily at 00:00 cg-clean-analysis-fluffy.service
Daily at 00:00 cg-clean-analysis-rnafusion.service
Daily at 00:00 cg-clean-analysis-mip-dna.service
Daily at 01:00 cg-clean-retrieved-spring-files.service
Daily at 01:00 cg-clean-scout-finished.service
Daily at 01:00 cg-compress-clean-fastq.service
Daily at 01:00 cg-upload-all-fastq.service
Daily at 01:00 cg-upload-nipt-all.service
Daily at 01:00 cg-upload-rnafusion.service
Daily at 01:00 cg-workflow-rnafusion-start-available.service
Daily at 01:00 cg-workflow-rnafusion-store-available.service
Daily at 01:00 cg-workflow-tomte-start-available.service
Daily at 01:00 cg-workflow-tomte-store-available.service
Daily at 09:00 log-storage.service
Daily at 18:00 cg-backup-encrypt-flow-cells.service
Daily at 18:15 cg-backup-flow-cells.service
Daily at 19:00 scout-load-research.service
Every Friday at 04:00 singularity-mutant.service
Every Saturday at 09:00 cg-backup-archive-spring-files.service
Every Saturday at 20:15 clean-stage-analysis-dirs.service
Every Sunday at 08:00 cg-clean-flow-cells.service
Every Sunday at 08:00 cg-clean-hk-case-bundle-files.service
Every Monday at 01:00 cg-clean-retrieved-spring-files.service
seallard commented 1 month ago

Done 🥳