Closed karlnyr closed 3 years ago
Executing test on 2 lanes: Observing the results of 2 lanes: Fastq files were generated, I am a bit concerned that all samples have Undetermined reads connected to them. Executing test on 8 lanes: Observing the results of 8 lanes: Fastq files were generated, still same situation were Undetermined are related to every sample.
@barrystokman I believe that this is ready now for review, I am also going to make some new tests on testing the workflow when we create an Undetermined
directory
I'm having a hard time seeing which test is which, what the expected outcomes are and if they have passed or not. Can you make that a bit clearer? From your wording it seems that two tests failed (2 and 8 lanes) and one test (pooled) has not been executed?
I'm having a hard time seeing which test is which, what the expected outcomes are and if they have passed or not. Can you make that a bit clearer? From your wording it seems that two tests failed (2 and 8 lanes) and one test (pooled) has not been executed?
Yes you are correct. Both of the tests above were before I implemented the change of undetermined files. I noticed the undetermined files issue in the tests and decided to add a fix to it as well. I have not performed the new tests just yet, got caught up in some production stuff. I can @ you again when the new tests are performed 😄
I see now that the Y was to be added here, but yes I think it's good to deploy the other PR with only that so that the demux of HISeqX works. :) A comment on this PR, 2 lanes = HiSeq 2500 and 8 lanes = HiSeqX.
@AnnaZetterlund @moahaegglund Is it correct that 2500 are 2 lanes and hiseqx are 8? In that case this PR was kind of a blunder since we could move the flowcells affected from the hiseqx directory to the 2500 and it would hopefully work.
@barrystokman Do you know what makes 2500's end up in hiseqx dir on hasta?
@AnnaZetterlund @moahaegglund Is it correct that 2500 are 2 lanes and hiseqx are 8? In that case this PR was kind of a blunder since we could move the flowcells affected from the hiseqx directory to the 2500 and it would hopefully work.
@barrystokman Do you know what makes 2500's end up in hiseqx dir on hasta?
HiSeq X is always 8 lanes, while HiSeq 2500 is either 2 lanes (rapid mode flow cells) or 8 lanes (high output mode flow cells).
I need a bit more information if I'm going to investigate flowcells being in the wrong dir :)
What are the full run names of the involved flowcells?
It's one flowcell that's in the wrong directory:
/home/proj/production/flowcells/hiseqx/150119_D00450_0151_AHB1T6ADXX
We have fetched 8 other 2500 flowcells and they all ended up in the correct location.
If that is the case, the part regarding lane counts can be scrapped in this PR. I do however believe that the undetermined files van be a solution to the previous issue you encountered @moahaegglund ?
@vwirta @henrikstranneheim I am also considering removing the undetermined files of pooled samples to save space, thoughts?
The only reason for having the undetermined reads would be to be able to look for samples not included in the sample sheet. Otherwise I have no strong arguments for keeping them.
@karlnyr I think it would be good to delete them
I've put in a project suggestion to redo all demultiplexing of HiSeqX flowcells as there is errors in Housekeeper and cgstats too connected to these flowcells above all this with the undetermined fastq files. If we are to remove all folders and redo it eventually I don't think we should put time into cleaning up in the existing folders.
I also think that we should think this through before doing changes on the demultiplexing in order to get it correct. The fastq files was previously linked to Unaligned but this changed during 2019, a guess is that it was changed in order to match with Novaseq. This linked fastq files doesn't work with spring/Housekeeper. The check if it's pooled or not has to be per lane as this differs between lanes on some flowcells. I suggest we have a meeting to discuss how the demultiplexing should be changed when there is time to start working on this project.
I've put in a project suggestion to redo all demultiplexing of HiSeqX flowcells as there is errors in Housekeeper and cgstats too connected to these flowcells above all this with the undetermined fastq files. If we are to remove all folders and redo it eventually I don't think we should put time into cleaning up in the existing folders.
I also think that we should think this through before doing changes on the demultiplexing in order to get it correct. The fastq files was previously linked to Unaligned but this changed during 2019, a guess is that it was changed in order to match with Novaseq. This linked fastq files doesn't work with spring/Housekeeper. The check if it's pooled or not has to be per lane as this differs between lanes on some flowcells. I suggest we have a meeting to discuss how the demultiplexing should be changed when there is time to start working on this project.
I agree with this, and the changes would be minor to link on single sample lanes and not link on singe sample lanes.
New tests for linking and no linking of undetermined files:
Pooled - HFH5HALXX
Non-Pooled - HVCFMCCXY
Pooled lanes on flowcells gets their undetermined files correctly moved into the Undetermined folder. Non-pooled lanes get their undetermined files correctly linked to their respective sample.
@moahaegglund @barrystokman Despite no real request to work on this now I have fixed the linking of the undetermined files. This way we will not link pooled lanes, and if we decide to remove the Undetermined we could simply remove the final part of xdemuxtiles.batch
in order to only link pooled lanes and scrap the rest of the files.
@karlnyr Great work! Is this also connected to project: https://github.com/Clinical-Genomics/project-planning/issues/252?
I would say so yes!
@barrystokman I believe this one is now finished as well. I was finally able to satisfy the linting action as well, so many edits in anything but demux/utils/samplesheet
and demux/cli/samplesheet/
and the tests are for linting reasons
deployed:
This PR fixes
How to prepare for test:
bash servers/resources/hasta.scilifelab.se/update-demux-stage.sh feat-hiseq-with-varying-lane-count
How to test:
Expected test outcome:
Review:
This version is a: