AfshinLab / BLR

MIT License
4 stars 0 forks source link

Integrate TELL-seq #34

Closed pontushojer closed 4 years ago

pontushojer commented 4 years ago

Relates to https://github.com/FrickTobias/BLR/issues/204

This PR introduces TELL-seq read processing to the pipeline. The barcode correction done a bit differently to how the the they do it in the paper (Chen et al. 2020) but this way was faster to integrate as it closely resembles our own way of correction BLR barcodes. This is how it is described in their paper.

After sequencing, all unique barcodes were identified along with the count for the number of reads they were associated with. The unique barcodes associated with only one read were error-corrected if they were 1-base mismatched with one of the barcodes associated with multiple reads. Barcodes with errors after this step were filtered out. The erroneous barcodes along with their associated reads were removed and excluded from the rest of analyses.

Step by step this is:

  1. Count reads associated each barcode.
  2. Correct barcode with only one read that are within a distance of 1 to a barcode associated to multiple reads.
  3. Remove all non-corrected singles.
  4. Remove all barcodes that are erroneous (does not match the barcode pattern of mixed and fixed bases).

In our case the correction is:

  1. Count reads associated each barcode.
  2. Correct any barcode with edit-distance of 1 if the relative read ratios is 2 (one has twice the number of associated reads).
  3. Remove all non-corrected singles.

One could have update this later on to include pattern matching (step 4 in their workflow) but they apparently have about 5 different patterns and its a bit unclear how this relates to different version of their technology.

I will do a full testrun to see what kind of results we get.

pontushojer commented 4 years ago

! Moved misplaced comment to https://github.com/FrickTobias/BLR/issues/219 !

pontushojer commented 4 years ago

I did a full testrun on uppmax and completed after some minor tinkering. One issue was that mapping rate were quite low at about 60% for bowtie2. I check using the small test dataset (from blr-testdata) and got similar rates for bowtie2. When running bwa however rates are good at about 90%.