arogozhnikov / demuxalot

Reliable, scalable, efficient demultiplexing for single-cell RNA sequencing
MIT License
22 stars 3 forks source link

Using demuxalot for ATAC data #17

Open himanshiarora7 opened 2 years ago

himanshiarora7 commented 2 years ago

I was recently working on ATAC data. Therefore, i wanted to ask if demuxalot can be used for ATAC data? Because each time i try to use it for my data i get the following error.

KeyError: “tag ‘NH’ not present

Incase it can be used for ATAC data as well, if yes, there is a way to solve this issue.

arogozhnikov commented 2 years ago

Hi @himanshiarora7

this should be completely possible to use demuxalot for ATAC, though I never tested that.

NH is a tag that is used to detect number of genomic alignments in STAR aligner.

Please sample ~10 random alignments from your bamfile and post them here so I could look at available alignment tags.

himanshiarora7 commented 2 years ago

Thank you for your response. I am attaching the sample alignment and the header for my ATAC bam file.

check_atac_bam.txt header_bam.txt

arogozhnikov commented 2 years ago

Hi, I think after recent update I think you can actually use demuxalot for scATAC demuxing, though with some trickery.

You'll need to provide custom callback for parsing reads instead, I think this one should be optimal:

def parse_read(read: AlignedRead) -> Optional[Tuple[float, int]]:
    """
    returns None if read should be ignored.
    Read still can be ignored if it is not in the barcode list
    """
    if read.get_tag("AS") <= len(read.seq) - 6:
        # more than 2 edits
        return None
    if not read.has_tag("UB"):
        # does not have molecule barcode
        return None

    if read.mapq < 20:
        # this one should not be triggered because of NH, but just in case
        return None

    p_misaligned = 0.01  # default value
    fake_ub = random.randint(0, 2 ** 31)
    return p_misaligned, fake_ub

A word of warning: 1) I don't have any scATAC data to test, 2) most important changes in demuxalot are scrnaseq-specific, so you're unlikely to benefit from them