Closed moahaegglund closed 1 year ago
Remove the flowcell (and samples) from cgstats and manually add it again. Remove the files you don't want MIP to use from HK.
As for the incorrect flowcell status:
How can we prevent that the files are being added twice to HK?
We need to make sure we don't add them a second time, so before fetching the flowcell from PDC and demuxing again we need to check if the flowcell is actually ondisk
. Unfortunately, the flowcell status in statusdb is not always correct. You need to check manually (for now) if the flowcell exists on disk. If that is the case, we don't need to fetch and demux. This is something the production team can do.
Of course doing things manually is not how we want to do things, so we need to check and if needed fix all flowcell statuses. This is something one of the developers will do.
When fetching from PDC works, doesn't it start automatically if MIP tries to start a case where the flowcell of 1 of 3 samples has status removed?
That's a good point and I think you are correct.
Yes, I found it here: function all_flowcells_on_disk in https://github.com/Clinical-Genomics/cg/blob/d3c802794ec3d6eb6c3fb327659c9f306dceb0d4/cg/meta/workflow/analysis.py.
The files will not be linked twice to the MIP analysis but in case of a fastq delivery it will be wrong (happened before). I don't know how the other pipelines would handle this.
And also in some cases I've seen that the focus on disk are only partly kept, some Fastq-filrs have been removed while some are still there, in those cases we still want to be able to demux the whole flowcell without adding files twice. Is it impossible to do this check automatically?
Remove the files you don't want MIP to use from HK.
To handle the files manually in this way is not a safe way to do it, it's very easy to do it wrong
I think we should wait to do anything until the prio case fastfalcon is delivered, the index sample is from this flowcell. But would it be possible to delete everything in HK connected to this flowcell, remove it from hasta (/home/proj/production/demultiplexed-runs/), cgstats and redo the demultiplexing? Would that solve this in short term without too much manual work? Or maybe its OK to remove the flowcell from disc? (I don't know if that is possible? I remember we all being confused about the removal of flowcells before summer.)
@moahaegglund that works
Flowcell HV7VTCCXY had status removed even though it in reality was on disc (present in the demultiplexed folder but not in HK) from being demultiplexed in 2019. The structure in the demultiplexed folder and the naming of the files has changed leading to the fastq files being included twice in Housekeeper.
The renaming of files during linking to MIP prevent the files from being included twice in that analysis. (For information I've solve this manually for sample ACC5113A11 while linking undetermined fastq files as I had to start a prio case.)
This has also affected cgstats, samples have the wrong amount of reads:
The sample has 857M reads according to LIMS.
How can we fix this is cgstats and HK? How can we prevent this from happening again?