broadinstitute / epi-SHARE-seq-pipeline

Epigenomics Program pipeline to analyze SHARE-seq data.
MIT License
17 stars 3 forks source link

Correction fixes #144

Closed mei-knudson closed 11 months ago

mei-knudson commented 12 months ago

utils.py

  1. R3 right shift was being checked against exact dictionary; now being checked against mismatch dictionary in correction function
  2. Returning barcode correction type from correction function (E/M/L/R)

bam_to_raw_fastq.py

  1. Adding R1 barcode correction type QC output
    • Should correction types be counted with a dict instead of with individual counters, so as to be consistent with how I count in correct_fastq.py?

correct_fastq.py

  1. Adding cell barcode correction type QC output
  2. Making dictionary of valid correction combinations; barcodes that do not have valid correction combinations are counted as nonmatches
    • Since we know which correction combinations we want to keep, should this dictionary be preset?
    • Should barcodes without valid correction combinations be counted in their own category instead of as nonmatches?
    • I'm counting polyG UMIs as their own category rather than appending them to the correction combination like EEEG, EEEU, etc.--is this ok?