caleblareau / asap_to_kite

CL python script to reformat CITEATAC fastqs for kite (kallisto | bustools) processing
3 stars 8 forks source link

Running scADT-seq data only #5

Open dy-lin opened 2 years ago

dy-lin commented 2 years ago

I'm trying to run the asap_to_kite_v2.py script on scADT-seq data only. After running bcl2fastq, I only have read1 and read2, but see references to a read3 in the code that is causing me some errors. Is this read3 error because I only have scADT-seq data instead of ASAP-seq? How can I adapt the code for TotalSeqA in asap_to_kite_v1 function?

listRead1 = trio[0]; listRead2 = trio[1]; # listRead3 = trio[2]

title1 = listRead1[0]; sequence1 = listRead1[1]; quality1 = listRead1[2]
title2 = listRead2[0]; sequence2 = listRead2[1]; quality2 = listRead2[2]
# title3 = listRead3[0]; sequence3 = listRead3[1]; quality3 = listRead3[2]

# Recombine attributes based on conjugation logic
if(conjugation == "TotalSeqA"):
    new_sequence1 = sequence2 + sequence1[0:10]
    # new_sequence2 = sequence3

    new_quality1 = quality2 + quality1[0:10]
    # new_quality2 = quality 3

out_fq1 = formatRead(title1, new_sequence1, new_quality1)
out_fq2 = formatRead(title2, new_sequence2, new_quality2)

If I comment out all references to a read3, there is the issue with out_fq2 requiring new_sequence2 which typically includes the use of a read3.

caleblareau commented 2 years ago

what's the source of your scADT-seq data? your own experiment or a public one? Is it for asap-seq?

dy-lin commented 2 years ago

Source of the data is our own experiment, following the protocols of your paper (Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells). The experiment uses the 10X Multiome kit to capture scRNA-seq and scATAC-seq data, and then the method described in the paper for scADT-seq. (Altogether we are attempting DOGMA-seq).

While this work was under review, 10x Genomics released the ‘Multiome’ product that captures the transcriptome and chromatin accessibil- ity from the same cells. We recognized that the mechanism of pro- tein barcode detection used in our previously described CITE-seq method5, via the barcoded poly-T primer, would be compatible with the Multiome product and that our efforts to preserve cell surface antigens and mtDNA described for ASAP-seq would be transferable to this kit.

So the data will be processed using CellRanger ARC for the scATAC-seq and scRNA-seq datasets, leaving the scADT-seq data by itself. Looking through your methods, it looks like you processed the data using asap_to_kite, then using kite and kb to achieve the featurecounts matrices.

caleblareau commented 2 years ago

If the ADT data was captured via the multiome, then you can run it directly through kallisto | bustools and don't need this utility. If you only have R1 and R2, that would be consistent. Just process the ADT data as if it were cite-seq data

dy-lin commented 2 years ago

Thanks, I'll give that a go. In the case of only scADT-seq data, is it necessary to go through kite https://github.com/pachterlab/kite, or just kb https://www.kallistobus.tools/kb_usage/kb_usage/ ?

caleblareau commented 2 years ago

I don’t have experience using kb; the main thing I think is specifying the right reference and the right barcode whitelist

On May 18, 2022, at 2:34 PM, Diana Lin @.**@.>> wrote:

Thanks, I'll give that a go. In the case of only scADT-seq data, is it necessary to go through kite https://github.com/pachterlab/kite, or just kb https://www.kallistobus.tools/kb_usage/kb_usage/ ?

— Reply to this email directly, view it on GitHubhttps://github.com/caleblareau/asap_to_kite/issues/5#issuecomment-1130572935, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD32FYJFXEUMI43QUS2E7VLVKVO5ZANCNFSM5WJT5LRQ. You are receiving this because you commented.Message ID: @.***>