Filter positions to remove those with high barcode coverage. For this processing had to be changed so that an entire chromosome is processed before looking for barcode duplicates. This generates a list of position from which the threshold for high coverage is taken from the quantile 0.99, unless it is less than the current minimum threshold currently set to 6.
Change duplicate comparisons so that only non-overlapping or Tn5-allowed overlapping positions are compared.
Window size set to config setting window_size with current default at 30,000.
To check that https://github.com/FrickTobias/BLR/issues/218 is fixed I looked at the top barcodes that other barcodes are merged into in the find_clusterdups step. This by using the final.barcode-merges.csv file.
To check that https://github.com/FrickTobias/BLR/issues/229 is fixed I collected runtime stats from the snakemake output log for the rule find_clusterdups for each chunk. The data was compiled into the graph below.
From this it is clear that runtime is shorter and more even as compared to chunk size.
Fix https://github.com/FrickTobias/BLR/issues/218, fix https://github.com/FrickTobias/BLR/issues/229
Changes include:
window_size
with current default at 30,000.Testrun
FASTQ =
/proj/uppstore2018173/private/rawdata/190510.HiSeq.emTn5.Next.reseq_4.XIV-XV/XV.reseq_4.R2.fastq.gz
Check https://github.com/FrickTobias/BLR/issues/218 fixed
To check that https://github.com/FrickTobias/BLR/issues/218 is fixed I looked at the top barcodes that other barcodes are merged into in the
find_clusterdups
step. This by using thefinal.barcode-merges.csv
file.Old version
New version
From this it is clear that less barcodes are assigned to the top cluster in the new version.
Check https://github.com/FrickTobias/BLR/issues/229 fixed
To check that https://github.com/FrickTobias/BLR/issues/229 is fixed I collected runtime stats from the snakemake output log for the rule
find_clusterdups
for each chunk. The data was compiled into the graph below.From this it is clear that runtime is shorter and more even as compared to chunk size.