COMBINE-lab / alevin-fry

šŸŸ šŸ”¬šŸ¦€ alevin-fry is an efficient and flexible tool for processing single-cell sequencing data, currently focused on single-cell transcriptomics and feature barcoding.
https://alevin-fry.readthedocs.io
BSD 3-Clause "New" or "Revised" License
166 stars 15 forks source link

Don't correct barcodes #130

Closed wangjiawen2013 closed 8 months ago

wangjiawen2013 commented 8 months ago

Hi, It is said that (https://combine-lab.github.io/alevin-fry-tutorials/2021/running-alevin-fry-fast/) The next command, collate, actually serves two distinct purposes simultaneously. First, this command will correct the barcodes according to the information written out during the generate-permit-list phase. That is, for each read in the mapping file, collate will determine if the barcode is either one of the true cell barcode that was selected by the generate-permit-list command, or if the barcode can be corrected to such a barcode (is within an edit distance of 1 of a selected barcode). At the same time the barcodes are corrected, they are also collated, ensuring that all of the mappings that correspond to reads (and therefore UMIs) within the same cell are placed subsequently in the output file. This allows the next command (quant) to read in all the mapped reads corresponding to a given cell at one time so that the UMIs can be resolved, the gene expression estimates for that cell determined, and any used memory be freed or made available for other work.

How to cancel this function. Sometimes we don't want to correct barcodes. Such as in smart-seq2 protocol, we put one barcode in each well of 384 well-plate. We know the exact sequence of barcodes in each well. These barcodes have at least two different nucleotides between barcodes. A barcode will be ambiguous if it's within an edit distance of 1 for two barcodes: barcode1: ACTG barcode2: AGTC detected barcode: ACTC, it can either be corrected to barcode1 and barcode2. How do alevin_fry and simpleaf deal with this situation ?

rob-p commented 8 months ago

Hi @wangjiawen2013,

alevin-fry will only make 1-edit corrections in the collate step if they are unambiguous. If there is a barcode (in your case ACTC) that is 1 edit away from 2 valid barcodes (in your case ACTG and AGTC) then the corresponding record will no be propagated in processing (i.e. this read will be discarded). We should clarify the documentation to specify this.

Best, Rob