cxzhu / Paired-seq

Paired-seq
14 stars 5 forks source link

Hi, If I want to change BCs or Linker seqeunce slightly, which file should I change? #4

Closed dmsalsgh97 closed 3 years ago

dmsalsgh97 commented 3 years ago

Hi, first, paired-seq is a really good protocol for joint profiling RNA and DNA(ATAC) on the same cell!

Now, I want to use the paired-seq protocol with slight customizing: changing Linker sequence slightly and only 3 BC rounds.

In my case, which script should I change?

I see that reachtool's combine function deals with BC and UMIs. reachtools.h has a section for bc_library and trimming linkers:

If I change reachtools.h's trim() function, can I modify BCs and linker sequence?

thanks!

cxzhu commented 3 years ago

Hi @dmsalsgh97,

Thank you for your interest in Paired-seq. I am sorry the code is a bit messy as I did not remove the unused functions.

If you want to use only 3 BC rounds, e.g.: ATAC/RT primers (1st) + R02 (1st of ligation) + R03 (2nd of ligation), you can refer to https://github.com/cxzhu/Paired-Tag, as we are using 2-round ligation in our latest Paired-Tag protocol (which could adopt to PE2*100 sequencing). As please note the function names are different between 3-rounds and 4-rounds (I kept the old functions from 4-rounds).

To change the adaptor sequences, you can change the sequences in reachtools.h, class "read2_r2" (which was called by "combine2" function (Line 1905 of "main.cpp"), for 3-level barcoding). The sequence for 1st BC (ATAC/RT primer) is in line 385 of "reachtools.h", 1nd BC (1st of ligation) is in Line 406 and 3rd BC (2nd ligation) is in Line 423. All these functions are in our latest https://github.com/cxzhu/Paired-Tag repo.

I hope this can help and please let me know if they are still unclear to you.

Best, Chenxu

dmsalsgh97 commented 3 years ago

Hi, cxzhu, First, thanks for your kindness!

Your comments are really in detail. I'll try this and if I succeed, I'll update on this issue

really thanks again :)

thanks. Minho

dmsalsgh97 commented 3 years ago

Hi @cxzhu, First, your detailed comments were really helpful!

As you mentioned, I slightly changed class "read2_r2" in reachtools.h. and due to full BC combination length is changed 18(7+7+4) into 19(8+8+3), changed main.cpp line 1913.

I used BC sequences and adapter sequences from SPiLT-seq, so my expected Read2 sequences is like below:

RNA: NNNNNNNNNNNNNNNNNNGTGGCCGATGTTTCGCATCGGCGTACGACTNNNNNNNNATCCACGTGCTTGAGAGGCCAGAGCATTCGTCNNN
DNA: NNNNNNNNNNNNNNNNNNGTGGCCGATGTTTCGCATCGGCGTACGACTNNNNNNNNATCCACGTGCTTGAGAGGCCAGAGCATTCGAGNNN

Now, I can generate ***_combined.fq.gz file with correct BC combinations! (BC1+BC2+BC4).

But I used the original barcode references file (cell_id_full.fa) for bowtie indexing, so I get 0.00% of valid BC sequences.

==================================================
Paired-seq/Tag Barcode Locator Report: Paired-3T3-DNA_doubletrimmed
# total raw reads:      53631420
# of full barcoded reads:   30606673
% of full barcode reads:    0.2%
==================================================

Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 30606673
# reads with at least one alignment: 0 (0.00%)
# reads that failed to align: 30606673 (100.00%)
No alignments

So I changed cell_id_full.fa file for using my BCs from SPiLT-seq, like below

#Original cell_id_full.fa file
>01:01:01
AAACAACAAACAACCATC
BC1(7) | BC2(7)  | BC4(4)

#Edited cell_id_full_fa file
>01:01:01
AACGTGATAACGTGATATC
BC1(8) | BC2(8) | BC4(3)

Is it enough for changing barcode sequences for downstream analysis?

Thnaks, Minho

cxzhu commented 3 years ago

Hi @dmsalsgh97,

The 30.6M reads from 53.6M reads with full barcodes looks reasonable to me (please forgive the weird 0.2% number and just ignore it).

If you have different lengths of barcode sequences, e.g., 8+8+3 instead of 7+7+4, please make sure you also change Line396, Line419 and Line437 of "reachtools.h", because these numbers tell the length of bases needed to be retrieved from read2. (I suppose you have changed it, as you also changed the Line1913 of "main.cpp")

And yes, the last step is to change the cell_id_full.fa and build the index for bowtie mapping.

Best, Chenxu

dmsalsgh97 commented 3 years ago

Hi cxzhu!

Thanks for the detailed comments! Your comments were really helpful.

I'll close this issue

Best, Minho