Fixes #3288
This PR introduces a distinction between a sample sheet failing validation by structural failures (most likely Bcl2Fastq or other old sample sheets) and failing by content (manually modified sample sheets). For this, two new exceptions replace the SampleSheetError: SampleSheetFormatError and SampleSheetContentError, respectively. Sample sheets failing validation for format will be regenerated as usual, but sample sheets failing for content will not.
Added
New exceptions SampleSheetFormatError and SampleSheetContentError
Changed
Replaced all occurrences of SampleSheetError for either SampleSheetFormatError and SampleSheetContentError
Fixed the try-except clauses to perform different conditions for the two different exceptions
Sample sheet Error only used in Nextflow code renamed to NfSampleSheetError
How to prepare for test
[x] Ssh to relevant server (depending on type of change)
[x] Use stage: us
[x] Paxa the environment: paxa
[x] Install on stage (example for Hasta):
bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_cg -t cg -b fix-sample-sheet-validation -a
How to test
[x] Create a sample sheet when a manually modified sample sheet exists in the sequencing dir, see that the validation fails but the sample sheet is not regenerated
$ cg -l DEBUG demultiplex samplesheet create 20231108_LH00188_0028_B22F52TLT3 --dry-run
Running cg demultiplex.
Getting a valid sample sheet for flow cell 20231108_LH00188_0028_B22F52TLT3
Instantiating sample sheet API
Instantiating housekeeper api
Initializing Store
Instantiating lims api
Called undefined __fields__ on HousekeeperAPI, please wrap
Set dry run to True
Set force to False
Instantiating IlluminaRunDirectoryData with path /home/proj/stage/sequencing_data/illumina/sequencing-runs/20231108_LH00188_0028_B22F52TLT3
Set sequencing run id to B22F52TLT3
Fetching and validating sample sheet from Housekeeper
Fetch latest version from bundle 22F52TLT3
Fetching files with tags in [22F52TLT3,samplesheet]
Fetching files from version 163391
Sample sheet file for flowcell 22F52TLT3 not found in Housekeeper!
Sample sheet from Housekeeper is not correctly formatted or does not exist, trying sample sheet in sequencing directory
Validating sample sheet
Validating that the sample sheet has all the necessary sections
Looking for index settings in the sample sheet
Found index settings: NovaSeqX
Looking for read and index run cycles in the sample sheet
Validating samples
Order samples by lane
Validate that samples are unique in lane: 1
Sample ACC13169A1 exists multiple times in sample sheet
Validation failed for /home/proj/stage/sequencing_data/illumina/sequencing-runs/20231108_LH00188_0028_B22F52TLT3/SampleSheet.csv. Possibly manually modified sample sheet. Sample sheet will not be re-generated.
[x] Create a sample sheet when a manually modified sample sheet exists in Housekeeper and a correct sample sheet exists in the sequencing run: see that the validation fails but the sample sheet is not regenerated
Fetching and validating sample sheet for 20231108_LH00188_0028_B22F52TLT3 from Housekeeper
Sample ACC13169A1 exists multiple times in sample sheet
Validation failed for /home/proj/stage/housekeeper-bundles/22F52TLT3/2023-11-08/SampleSheet.csv. Possibly a manually modified sample sheet. Sample sheet will not be re-generated.
[x] Create a sample sheet when a sample sheet with incorrect sections exists in the sequencing dir, see that the validation fails and the sample sheet is regenerated
Running cg demultiplex.
Getting a valid sample sheet for flow cell 20231108_LH00188_0028_B22F52TLT3
Called undefined __fields__ on HousekeeperAPI, please wrap
Fetching and validating sample sheet for 20231108_LH00188_0028_B22F52TLT3 from Housekeeper
Sample sheet file for flowcell 22F52TLT3 not found in Housekeeper!
Sample sheet from Housekeeper is not correctly formatted or does not exist, trying sample sheet in sequencing directory
No index settings found in sample sheet
Sample sheet from sequencing directory is not correctly formatted or does not exist, creating new sample sheet
Fetching samples from lims for flowcell 22F52TLT3
Constructing sample sheet for the novaseqx flow cell 22F52TLT3
Updating barcode mismatch values for samples in lane 1
Updating barcode mismatch values for samples in lane 2
Updating barcode mismatch values for samples in lane 3
Updating barcode mismatch values for samples in lane 4
Updating barcode mismatch values for samples in lane 5
Updating barcode mismatch values for samples in lane 6
Updating barcode mismatch values for samples in lane 7
Updating barcode mismatch values for samples in lane 8
Creating sample sheet content
Samplesheet passed validation
[x] Demultiplex with a sample sheet modified that fails content validation
$ cg -l DEBUG demultiplex sequencing-run 20231108_LH00188_0028_B22F52TLT3 --dry-run
Running cg demultiplex.
Starting demultiplexing of sequencing run 20231108_LH00188_0028_B22F52TLT3
Instantiating sample sheet API
Instantiating housekeeper api
Initializing Store
Instantiating lims api
Called undefined __fields__ on HousekeeperAPI, please wrap
Instantiating demultiplexing api
Called undefined __fields__ on HousekeeperAPI, please wrap
Initialising Process with binary: sbatch
Use base call ['sbatch']
Set environment to stage
DemultiplexingAPI: Set dry run to True
SlurmAPI: Set dry run to True
setting flow cell id to 20231108_LH00188_0028_B22F52TLT3
setting demultiplexed runs dir to /home/proj/stage/sequencing_data/illumina/demultiplexed-runs
Instantiating IlluminaRunDirectoryData with path /home/proj/stage/sequencing_data/illumina/sequencing-runs/20231108_LH00188_0028_B22F52TLT3
Set sequencing run id to B22F52TLT3
Check if demultiplexing is possible for 22F52TLT3
Check if sequencing run is ready for downstream processing
Check if sequencing is done
Sequence is done for sequencing run 22F52TLT3
Check if copy of data from sequence instrument is ready
All data has been transferred for sequencing run 22F52TLT3
Sequencing run 22F52TLT3 is ready for downstream processing
Check if sample sheet exists
Fetch latest version from bundle 22F52TLT3
Fetching files with tags in [22F52TLT3,samplesheet]
Fetching files from version 163391
Validating sample sheet
Validating that the sample sheet has all the necessary sections
Looking for index settings in the sample sheet
Found index settings: NovaSeqX
Looking for read and index run cycles in the sample sheet
Validating samples
Order samples by lane
Validate that samples are unique in lane: 1
Sample ACC13169A1 exists multiple times in sample sheet
Demultiplexing a with a manually modified sample sheet
Would have started demultiplexing 20231108_LH00188_0028_B22F52TLT3
[js.diazboada@hasta:20231108_LH00188_0028_B22F52TLT3] [S_base] $
[x] Tests executed by SD
[x] "Merge and deploy" approved by VJ
Thanks for filling in who performed the code review and the test!
[x] PATCH - when you make backwards compatible bug fixes or documentation/instructions
Implementation Plan
[x] Deployed to stage:
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
repository is clean
Logging deploy ...
Getting deployer... done.
Getting last commit message and SHA... done.
Getting version of deploy scripts... /home/proj/stage/sequencing_data/illumina/sequencing-runs/20240524_LH00202_0106_B227WKJLT4
done.
Log deploy... done.
cg, version 61.2.8
[js.diazboada@hasta:20240524_LH00202_0106_B227WKJLT4] [S_base] $ up
[x] Deployed to production:
Logging deploy ...
Getting deployer... done.
Getting last commit message and SHA... done.
Getting version of deploy scripts... /home/proj/stage/sequencing_data/illumina/sequencing-runs/20240524_LH00202_0106_B227WKJLT4
done.
Log deploy... done.
cg, version 61.2.8
remote: Enumerating objects: 260, done.
remote: Counting objects: 100% (260/260), done.
remote: Compressing objects: 100% (110/110), done.
remote: Total 260 (delta 178), reused 211 (delta 146), pack-reused 0
Receiving objects: 100% (260/260), 66.59 KiB | 0 bytes/s, done.
Resolving deltas: 100% (178/178), completed with 61 local objects.
From https://github.com/Clinical-Genomics/cg
bbe5c6d..ebcc262 master -> origin/master
744b260..2920f0e 3307-not-exiting-multi-case-upload-process-when-one-gens-upload-fails -> origin/3307-not-exiting-multi-case-upload-process-when-one-gens-upload-fails
* [new branch] add-analysis-delivery-message -> origin/add-analysis-delivery-message
* [new branch] gens_for_tga -> origin/gens_for_tga
4ef2738..6c3f665 raredisease-add-clinical-delivery -> origin/raredisease-add-clinical-delivery
* [new branch] rewire-backup -> origin/rewire-backup
* [new tag] v61.2.8 -> v61.2.8
Already on 'master'
Your branch is behind 'origin/master' by 2 commits, and can be fast-forwarded.
(use "git pull" to update your local branch)
Updating bbe5c6d..ebcc262
Fast-forward
.bumpversion.cfg | 2 +-
cg/__init__.py | 2 +-
cg/apps/demultiplex/sample_sheet/api.py | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------------------
cg/apps/demultiplex/sample_sheet/read_sample_sheet.py | 8 ++++----
cg/apps/demultiplex/sample_sheet/sample_models.py | 3 ---
cg/apps/demultiplex/sample_sheet/sample_sheet_validator.py | 21 ++++++++++++---------
cg/cli/demultiplex/demux.py | 30 ++++++++++++++++++------------
cg/cli/demultiplex/sample_sheet.py | 4 ++--
cg/exc.py | 10 +++++++++-
cg/models/nf_analysis.py | 6 +++---
pyproject.toml | 2 +-
tests/apps/demultiplex/test_read_sample_sheet.py | 8 ++++----
tests/apps/demultiplex/test_sample_sheet_validator.py | 10 +++++-----
tests/apps/demultiplex/test_translate_sample_sheet.py | 4 ++--
tests/models/rnafusion/test_rnafusion_sample.py | 6 +++---
15 files changed, 134 insertions(+), 89 deletions(-)
/home/proj/stage/sequencing_data/illumina/sequencing-runs/20240524_LH00202_0106_B227WKJLT4
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
[js.diazboada@hasta:20240524_LH00202_0106_B227WKJLT4] [P_base] $
Description
Fixes #3288 This PR introduces a distinction between a sample sheet failing validation by structural failures (most likely Bcl2Fastq or other old sample sheets) and failing by content (manually modified sample sheets). For this, two new exceptions replace the
SampleSheetError
:SampleSheetFormatError
andSampleSheetContentError
, respectively. Sample sheets failing validation for format will be regenerated as usual, but sample sheets failing for content will not.Added
SampleSheetFormatError
andSampleSheetContentError
Changed
SampleSheetError
for eitherSampleSheetFormatError
andSampleSheetContentError
NfSampleSheetError
How to prepare for test
us
paxa
How to test
[x] Create a sample sheet when a manually modified sample sheet exists in the sequencing dir, see that the validation fails but the sample sheet is not regenerated
[x] Create a sample sheet when a manually modified sample sheet exists in Housekeeper and a correct sample sheet exists in the sequencing run: see that the validation fails but the sample sheet is not regenerated
[x] Create a sample sheet when a sample sheet with incorrect sections exists in the sequencing dir, see that the validation fails and the sample sheet is regenerated
[x] Demultiplex with a sample sheet modified that fails content validation
[x] Tests executed by SD
[x] "Merge and deploy" approved by VJ Thanks for filling in who performed the code review and the test!
This version is a
Implementation Plan