seallard commented 1 month ago

Think about the following questions and prepare brief answers for a presentation and discussion.

Identify state transitions in CG at large where a message could be used.
- What could the topic be called?
- What could the message be called?
- What data should it contain if any?
How does the message impact turn-around-time, resilience and decoupling?

seallard commented 1 month ago

Don't forget this one @diitaz93, @ChrOertlin, @islean, @Vince-janv

seallard commented 1 month ago

Possible state transition 1: start analyses after post processing

When post processing completes for a flow cell, new data becomes available which potentially can be used to start analyses.

From: post processing completed for flowcell. To: analyses started for cases.

For example, the MIP-DNA workflow. Current trigger: crontab which runs workflow-mip-dna-start.sh (cg workflow mip-dna start-available) Frequency: every hour.

Microbial analyses are started every 6th hour, so that could be sped up a lot.

Possible topic names

FlowcellPostProcessing
DemultiplexingPostProcessing
DemuxPostProcessing

Possible event names

PostProcessingCompleted

Event content

The flow cell ID.

How does the message impact turn-around-time, resilience and decoupling?

Up to 1 hour reduction for the state transition and TAT. No real decoupling. Resilience: we can have retries in the consumer trying to start the analyses. For example if a database, slurm or something else goes down temporarily for some reason.

islean commented 1 month ago

Possible state transition 1: start analyses after post processing

When post processing completes for a flow cell, new data becomes available which potentially can be used to start analyses.

From: post processing completed for flowcell. To: analyses started for cases.

For example, the MIP-DNA workflow. Current trigger: crontab which runs workflow-mip-dna-start.sh (cg workflow mip-dna start-available) Frequency: every hour.

Possible topic names
* `FlowcellPostProcessing`

* `DemultiplexingPostProcessing`

* `DemuxPostProcessing`
Possible event names
* `PostProcessingCompleted`
Event content

The flow cell ID.

How does the message impact turn-around-time, resilience and decoupling?

Up to 1 hour reduction for the state transition and TAT. No real decoupling. Resilience: we can have retries in the consumer trying to start the analyses. For example if a database, slurm or something else goes down temporarily for some reason.

Are cases always run on just one flow cell?

ChrOertlin commented 1 month ago

Description

Maybe a nice place to start might be to make the reading of workflow qc data into arnold event driven. Currently, this project is on hold since we have some re-working in trailblazer to do. However, it might be nice to create a producer for finished workflows and let cg consume to this to then trigger parsing of the qc data.

Benefits

Seemingly straight forward
Immediate benefits to the organisation

Proposed flow

Topic

Name: 'AnalysisRun'

Event

Name: 'AnalysisCompleted' Content: 'case_internal_id'

How does this improve TAT, resilience and decoupling?

As I see it now it will resolve an immediate issue where this flow would be coupled to the trailblazer flow, however once a slurm job is finished successfully it could produce the event. It does not affect TAT but can offer other tangible benefits in storing data with valuable information for business logic.

islean commented 1 month ago

Possible state transition 2: Upload completed cases

When a case has been run, the analysis is to be uploaded. This could be done by setting up a listener on the analysis table in cg with a completed_at trigger

From: The completed_at has been set in the Analysis table To: The upload command is run

Example: Tomte Current trigger: systemd which runs cg upload auto --workflow tomte Frequency: every tenth minute.

Possible topic names

AnalysisStatus
AnalysisUpdate

Possible event names

AnalysisCompleted

Event content

Case internal id

How does the message impact turn-around-time, resilience and decoupling?

Up to 10 minute reduction for the state transition and TAT. No real decoupling. Resilience: Might go down a bit. There are some more filters that need to be considered. I do not know if we need to set up more triggers to cover events where completed analyses are not to be uploaded.

henrikstranneheim commented 1 month ago

Do we have to use a listener? It would be nice to publish the event from the upstream process.

seallard commented 1 month ago

Do we have to use a listener? It would be nice to publish the event from the upstream process.

No, we don't have to listen to database tables. We will explicitly publish events from the upstream process. There are many reasons as to why integrating kafka directly against our databases is a bad idea at this stage in our organization.

Mainly, I think it is a bad pattern because we have one monolith with one giant database that a lot of code writes to. What if other components write to the column? What if a one time script is run writing data?

It is much better if we are explicit about when an event is supposed to be published instead of introducing another layer of indirection.

seallard commented 1 month ago

Are cases always run on just one flow cell?

Nope! So the listener would still identify which cases are ready to be started.

islean commented 1 month ago

Possibility with store available

Short rundown. We could set up a publisher in trailblazer which posts an event whenever an analysis is completed containing the case's internal id. Then we set up a consumer in cg which triggers the store functionality. This would add some decoupling between cg and trailblazer.

diitaz93 commented 1 month ago

Possible state transition 1: start demultiplexing

When flow cells are transferred from the NAS or from PDC.

From: Copy complete flow cell. To: start demultiplexing.

Current trigger: crontab which runs start-demux.sh (cg demultiplex all) Frequency: every 10 min.

Possible topic names

Demultiplexing

Possible event names

FlowCellCopyCompleted

Event content

The flow cell ID/ full name.

How does the message impact turn-around-time, resilience and decoupling?

Up to 10 minutes reduction for the state transition and TAT. Some buffer time in the DRAGEN (not start demultiplexing all FC at the same time Avoid flow cells being taken twice for demultiplexing by the automation (maybe?)

seallard commented 1 month ago

Here is a summary of all our crontab timers	Frequency	Description
Every Saturday at 02:05	Microbial DB Backup
Daily at 00:00	Upload processed cases to mutacc database
Daily at 03:10	AWS DB Backup
Daily at 20:00	Process solved cases with mutacc
Daily at 21:45	Count Housekeeper files
Every 6th day-of-week at 20:00	Delete files and empty dirs in customer inbox
Every 8th hour	Store completed microbial analyses in Housekeeper
Every 6th hour	Start all new microbial analyses
Every 4th hour	Check for received samples
	Check for prepared samples
	Check for delivered samples
	Check for delivered pools
	Check for received pools
Every 3rd hour	Upload results for MIP-DNA
	Upload results for BALSAMIC
	Upload results for BALSAMIC-UMI
Every 2nd hour	Store completed MIP-DNA analyses in Housekeeper
Every hour	Start analyses for BALSAMIC
	Store BALSAMIC analyses in Housekeeper
	Fetch ONE requested flowcell from PDC
	Start analyses for MIP-DNA
	Store completed MIP-RNA analyses
	Start analyses for MIP-RNA
	Start available analyses for mutant
	Store available analyses for mutant
Every 10 minutes	Create Novaseq demux sample sheet
	Start demultiplexing of all flow cells
	Start available analyses for Fluffy
	Store available analyses for Fluffy
Every 5 minutes	Scan for analyses

seallard commented 1 month ago

Our system d timers	Frequency	Description
Every 10 minutes	cg-archive-update-job-statuses.service
Every 10 minutes	cg-demultiplex-finish-all.service
Every 10 minutes	cg-demultiplex-create-illumina-manifest-files.service
Every 10 minutes	cg-demultiplex-create-nanopore-manifest-files.service
Every 10 minutes	cg-demultiplex-confirm-flow-cell-sync.service
Every 10 minutes	cg-demultiplex-copy-completed-flow-cell.service
Every 3rd hour	cg-upload-mip-rna.service
Every 3rd hour	cg-upload-microsalt.service
Daily at 00:00	sql-backup-remove.service
Daily at 00:00	cg-compress-fastq.service
Daily at 00:00	mongo-backup.service
Daily at 00:00	mongo-backup-remove.service
Daily at 00:00	sql-clean-binlog.service
Daily at 00:00	sql-backup.service
Daily at 00:00	cg-clean-analysis-balsamic-qc.service
Daily at 00:00	cg-clean-analysis-balsamic-umi.service
Daily at 00:00	cg-clean-analysis-microsalt.service
Daily at 00:00	cg-clean-analysis-mip-rna.service
Daily at 00:00	cg-clean-analysis-mutant.service
Daily at 00:00	cg-clean-rsync-dirs.service
Daily at 00:00	cg-clean-analysis-balsamic.service
Daily at 00:00	cg-clean-analysis-fluffy.service
Daily at 00:00	cg-clean-analysis-rnafusion.service
Daily at 00:00	cg-clean-analysis-mip-dna.service
Daily at 01:00	cg-clean-retrieved-spring-files.service
Daily at 01:00	cg-clean-scout-finished.service
Daily at 01:00	cg-compress-clean-fastq.service
Daily at 01:00	cg-upload-all-fastq.service
Daily at 01:00	cg-upload-nipt-all.service
Daily at 01:00	cg-upload-rnafusion.service
Daily at 01:00	cg-workflow-rnafusion-start-available.service
Daily at 01:00	cg-workflow-rnafusion-store-available.service
Daily at 01:00	cg-workflow-tomte-start-available.service
Daily at 01:00	cg-workflow-tomte-store-available.service
Daily at 09:00	log-storage.service
Daily at 18:00	cg-backup-encrypt-flow-cells.service
Daily at 18:15	cg-backup-flow-cells.service
Daily at 19:00	scout-load-research.service
Every Friday at 04:00	singularity-mutant.service
Every Saturday at 09:00	cg-backup-archive-spring-files.service
Every Saturday at 20:15	clean-stage-analysis-dirs.service
Every Sunday at 08:00	cg-clean-flow-cells.service
Every Sunday at 08:00	cg-clean-hk-case-bundle-files.service
Every Monday at 01:00	cg-clean-retrieved-spring-files.service

seallard commented 1 month ago

Done 🥳

Clinical-Genomics / event-driven-architecture

Think about possible events #6

Possible state transition 1: start analyses after post processing

Possible topic names

Possible event names

Event content

How does the message impact turn-around-time, resilience and decoupling?

Possible state transition 1: start analyses after post processing

Possible topic names

Possible event names

Event content

How does the message impact turn-around-time, resilience and decoupling?

Description

Benefits

Proposed flow

Topic

Event

How does this improve TAT, resilience and decoupling?

Possible state transition 2: Upload completed cases

Possible topic names

Possible event names

Event content

How does the message impact turn-around-time, resilience and decoupling?

Possibility with store available

Possible state transition 1: start demultiplexing

Possible topic names

Possible event names

Event content

How does the message impact turn-around-time, resilience and decoupling?