Closed diitaz93 closed 11 months ago
UPDATE: It seems that the samples that have a mismatch in the flow cell id in the path and in the tags are top-ups, having the flow cell id of their previous flow cell in the tag
# Remove ^M
perl -i -p -e's/\r/\n/g' "file"
If you dare go old perlish style
So the current files don't have single flow cell in their path? Seems to me as if we are just prepending the "real" flow cell to the name - should we not remove the old one?
Looks good! Could it be the case that we need to add any of the Flow cell tags? Or do we know that they exist?
So the current files don't have single flow cell in their path? Seems to me as if we are just prepending the "real" flow cell to the name - should we not remove the old one?
There are two main groups, the ones that have the name and the ones that don't. It turned out to be (and I forgot to mention) that for the ones that have a flow cell name in the path is the correct one, the one that is wrong is always the tag. For those cases, only the tag is updated
self.hk_api.get_tag(name=self.real_flow_cell_id)
the function
self.hk_api.get_tag(name=self.real_flow_cell_id)
checks if the tag exists and creates a new one if it doesn't :)
Logic looks solid though 👍
- If the conclusion is that the flow cell in the name is always correct, why do we have a function to update the paths?
In some cases the name does not have a flow cell, in which the name is added. If the name is already there we skip that step
- If some category of flow cells in this list only appears once (like the NovaSeq X one) I think they can be dealt with manually instead of making the script more complex.
There is no distinction between flow cell types in this code, or which part are you referring to?
Ran on stage and production without errors. Updated 258 files in production
Description
After the addition of flow cell ids as tags to fastq and spring, some fastqs have a mismatch between their tag and the flow cell in their name. The cause of this is not certain but it was most likely because the flow cell in status db, which was fetched for the tag assignment was incorrect.
Solution
I have identified all fastq files in which this happens and consolidated a csv file called
01_map_file2realFCID.csv
with 3 columns: The first one is the full path of the file, the second is the flow cell tag and the third one is the true flow cell, extracted directly from the fastw file usingzless
. It looks like this:I could not get rid of the Mac new line remanent (
^M
) which is solved below.The following script reads the csv file and updates the file paths and tags for every Housekeeper File in the list: