Clinical-Genomics / cg

Glue between Clinical Genomics apps
8 stars 2 forks source link

2500 flowcells are not demultiplexed with the new automation #2361

Closed beatrizsavinhas closed 9 months ago

beatrizsavinhas commented 1 year ago

Description

2500 flow cells stuck on "retrieve" are not being picked up for demultiplexing by the new automation.

More information on: https://docs.google.com/document/d/1iywLruJs5-ca4oOt8c7qMCfajBGbzIpnPNtsFO5TMr0/edit?usp=sharing

The SampleSheets for these old flow cells cannot be generated from scratch as the sample data has been deleted from LIMS. Using the SampleSheets stored in housekeeper, when running cg demultiplex flow-cell <flow_cell_dir> -b bcl2fastq a "Malformed sample sheet." error was raised. Therefore these SampleSheets need to be corrected to match the new structure, as follows:

Original (malformed): FCID,Lane,SampleID,SampleRef,Index,Description,Control,Recipe,Operator,SampleProject

Modified to: FCID,Lane,SampleID,SampleRef,index,SampleName,Control,Recipe,Operator,Project

NOTE: Adding [Data] as the first line of the file might be also necessary. NOTE: For dual index flow cell runs, after index, a column index2 can be added to specify the second index sequence.

Still, demultiplexing doesn't start automatically, assumingly because the CopyComplete.txt file is missing. Added the CopyComplete file to the flow cell directory and started the demultiplexing manually.

touch CopyComplete.txt
cg demultiplex flow-cell <flow_cell_dir> -b bcl2fastq

Demultiplexing job completed.

Suggested solution

This can be closed when

2500 flow cells can be retrieved and demultiplexing starts automatically.

Blocked by

Nothing.

ChrOertlin commented 1 year ago

We missed some implementation details from here it seems: https://github.com/Clinical-Genomics/demultiplexing/blob/master/scripts/2500/checkfornewrun.bash https://github.com/Clinical-Genomics/demultiplexing/blob/master/scripts/2500/demux-2500.bash

henrikstranneheim commented 1 year ago

@islean Could you link your PR and add the notes from the discussion why these edge cases is not worth handling in the code?

islean commented 1 year ago

@henrikstranneheim https://github.com/Clinical-Genomics/cg/pull/2365

I think the reasoning for not deploying an automated solution for this particular issue is that the scope was deemed to broad for an issue that will happen quite seldom, given that the condition for this happening is so rare that manual handling might be quicker. If I am not mistaken, we do not have any support anymore for the software used to generate the samplesheets, is that so @ChrOertlin? It might be worth doing as part of the archival work, given that we would want to retrieve a lot from PDC and convert to Spring files for DDN archival, though. Any thoughts @Vince-janv ?

Vince-janv commented 1 year ago

Sample sheet problem will be handled by this Issue: https://github.com/Clinical-Genomics/project-planning/issues/521

henrikstranneheim commented 9 months ago

@diitaz93 Resolved?

diitaz93 commented 9 months ago

@henrikstranneheim Yes, after the patch in #2913 the flow cells are retrieved by the automation. For the flow cells without samples in LIMS, the demultiplex will still fail but that is out of the scope of this issue as agreed with @beatrizsavinhas, the important part was being started automatically.