Open zhou-ran opened 5 years ago
Hi Ran,
at the moment, you would need to preprocess the reads (say using python regex package, as UMI-tools does) to create the two reads similar to Drop-seq/10X, with one read containing all parts of cell barcode concatenated together, and UMI. Only constant barcode lengths are supported.
I am gearing up to implement complex barcode configurations, so if you can tell me how exactly the microwell-seq barcodes look like, I will try to include it.
Cheers Alex
Hi Ales, That's great for including the configurations for complex barcode.
In microwell-seq, the barcodes was linked by two linker sequence, and the real barcodes was located at 1-6:22-27:43-48, and umi at 49-54. you could test on the SRR6954503 file.
Thanks Ran
Hi Ran,
thanks! So the CB and UMI can be extracted from the fixed positions, no need to search for linker sequences? This makes it a bit easier.
Cheers Alex
Hi Alex,
Yep! actually it's the parameter used in Dropseq tools.
Thanks Ran
Hi Alex, In STARsolo mode, why we need add the --soloBarcodeReadLength? For example, STARsolo can't work well with 10X fastq after qc, because not all the R1's length was 150bp. how about only care the length of CB and UMI must smaller than the read length? Thanks Ran
Hi Ran,
in --readFilesIn, the cDNA fastq needs to be supplied first, and barcode read - second, e.g. for 10X fastqs, where cDNA read is R2, and barcode read is R1, the files need to be supplied with --readFilesIn R2.fq R1.fq STAR will map them as single-end reads. The cDNA read can be trimmed and have variable length, its length is not checked.
The barcode read should not be trimmed at all. However, you can specify --soloBarcodeReadLength 0 to prevent STAR from checking its length.
Cheers Alex
Hi Alex: Sorry I donot understand the "Barcode geometry" for complex barcodes. in --soloAdapterSequence, seems that we can only input one adapter sequence to anchor barcodes. how about the barcode mode is BC1+Adaptor1+BC2+Adaptor2+BC3+Adaptor3+BC4+UMI+PolyT? Thanks Tao
Hi Tao,
what are the lengths of the BC1-3 and Adaptor1-3?
Cheers Alex
Hi Alex:
BC1=8bp BC2=8bp BC3=8bp BC4=8bp UMI=12bp Adaptor1=30bp Adaptor2=25bp Adaptor3=20bp
Thanks Tao
Hi @lin-zhongbao
since the adapters and barcodes all have constant lengths, you should be able to process it with:
--soloType CB_UMI_Complex
--soloCBposition 0_0_0_7 0_38_0_45 0_71_0_78 0_99_0_106 (barcode coordinates)
--soloUMIposition 0_107_0_118 (umi position)
--soloCBwhitelist WL1.txt WL2.txt WL3.txt WL4.txt (these whitelists for the 4 barcodes).
In each x1_x2_x3_x4 tuple, x1=x3=0 indicates that you are measuring distances from the read start x2=barcode start, x3=barcode end, both zero-based - please check my calculations.
This approach should recover most of the barcodes. However, indels may screw up the distances of the farther barcodes. In this case, we can try to use the 2nd adaptor sequence to align the 3rd CB, 4th CB and UMI, something like
--soloType CB_UMI_Complex
--soloCBposition 0_0_0_7 0_38_0_45 3_1_3_8 3_29_3_36 (barcode coordinates)
--soloUMIposition 3_37_3_48
--soloAdapterSequence Adaptor2sequence
--soloCBwhitelist WL1.txt WL2.txt WL3.txt WL4.txt (these whitelists for the 4 barcodes).
x1=x3=3 indicates that we measure the distance from the adapter end.
Please let me know how it goes. If it does not works, please send me a few hundred barcode reads (good ones, from the middle of the fastq), and also the whitelist files.
Cheers Alex
Hi Alex:
Yes, It works very well. Thank you very much.
So now I understand that we can use only one adaptor's start(end) anchor in the parameter "--soloCBposition".
For the muti-adaptor situation, indels may be exist in any adaptor or barcode.
Could you please extend the parameter "--soloCBposition" if you have time?
like this: start(end)Anchor defines the anchor base for the CB: 0: read start; 1: read end; 2: adapter1 start; 3: adapter1 end; 4: adapter2 start; 5: adapter2 end; 6: adapter3 start; 7: adapter3 end;
so for the complex barcode mode BC1+Adaptor1+BC2+Adaptor2+BC3+Adaptor3+BC4+UMI+PolyT,
then we can try to use all the three adaptor's sequence to align, something like
--soloType CB_UMI_Complex
--soloCBposition 0_0_2_-1 3_1_4_-1 5_1_6_-1 7_1_7_8 (barcode coordinates)
--soloUMIposition 7_9_7_20
--soloAdapterSequence Adaptor1sequence Adaptor2sequence Adaptor3sequence
--soloCBwhitelist WL1.txt WL2.txt WL3.txt WL4.txt (these whitelists for the 4 barcodes).
Hi Tao,
at the moment only one anchor adapter can be used, so you need to choose which of the three adapters you will use as the anchor.
Note that no indels are allowed in the barcodes or the anchor adapter, so by using one of the adapters as an anchor you will only recover the reads with indels in the other adapters. I am not sure if it's going to give many additional reads compared to the simple scheme where you anchor barcodes to the beginning of the read.
Cheers Alex
Hi, Alex
Apologies if I have missed this somewhere, and I am trying to process my 10X data with STARsolo. This is very convenient. But if there were more than one barcode and UMI molecular in my scRNA-seq data like microwell-seq, which there were two link sequence in the barcode reads and barcodes were not continuous. STARSolo can process this data?
Thank you for any reply
Ran