Closed y9c closed 7 years ago
I don't have any dual index data to test this on, but I believe for your barcode in the barcodes file you use something like:
sample_name TAATGCGC-GTACTGAC
The second barcode is likely reverse complemented.
Thank you @brwnj.
If the reads are in separate files, as seq_R1.fq
and seq_R2.fq
. how to set up set up the command?
BTW, I wonder what is the relationship between this repo and ea-utils? Is the fastq-multx
in ea-utils
up to date?
I don't know the command for sure. Re:
I don't have any dual index data to test this on
The relationship is such that this code is directly from ea-utils with slightly different versioning. The only changes present are to typos in the help message.
Hi @brwnj This is some test data. Would you please show me the code? Thank you very much.
Fix the barcodes as stated above:
awk 'BEGIN{FS=" ";OFS="\t"}!/^#/{print $1,$2"-"$3}' barcode.txt > fixed_barcodes.txt
Then:
fastq-multx -B fixed_barcodes.txt test.1.fq.gz test.2.fq.gz -o %_R1.fastq -o %_R2.fastq
The top bit of the output includes counts of:
Id | Count | File(s) | |
---|---|---|---|
F111 | 36 | F111_R1.fastq | F111_R2.fastq |
F114 | 9 | F114_R1.fastq | F114_R2.fastq |
F121 | 10 | F121_R1.fastq | F121_R2.fastq |
F124 | 16 | F124_R1.fastq | F124_R2.fastq |
F131 | 14 | F131_R1.fastq | F131_R2.fastq |
F134 | 21 | F134_R1.fastq | F134_R2.fastq |
F141 | 31 | F141_R1.fastq | F141_R2.fastq |
F144 | 16 | F144_R1.fastq | F144_R2.fastq |
the second barcode is not reverse complemented.
There is a problem, but that's not it. fastq-multx
is matching barcodes in the sequence line only and not the header. Using -H
, which should use the header, causes a seg fault.
I would recommend trying out Brian Bushnell's demuxbyname.sh
method outlined here: https://www.biostars.org/p/139395/.
some note:
If the sequence orientation is undetermined, use this barcode list to demultiplex the file.
awk '!/^#/{print $1"\t"$2"-"$3"\n"$1"\t"$3"-"$2}' barcode.txt > fixed_barcodes.txt
Dual barcode should in the format as
barcode1-barcode2
.
Write barcode sequence is in the original orientation, and shouldn't reverse barcode2.
@brwnj
the second read is not trimed..
@brwnj
Any progress on this?
Progress? Prove to me that these reads are dual-indexed.
You can clearly see the reads coming off the sequencer have the same index per sequence:
@HWI-D00523:240:HF3WGBCXX:1:1116:1699:4861 1:N:0:CCTCCT
@HWI-D00523:240:HF3WGBCXX:2:2212:6141:20342 1:N:0:CCGTGA
@HWI-D00523:240:HF3WGBCXX:1:2101:18265:67898 1:N:0:CCTCCT
@HWI-D00523:240:HF3WGBCXX:1:1116:1699:4861 2:N:0:CCTCCT
@HWI-D00523:240:HF3WGBCXX:2:2212:6141:20342 2:N:0:CCGTGA
@HWI-D00523:240:HF3WGBCXX:1:2101:18265:67898 2:N:0:CCTCCT
@brwnj I mean the bug that barcode in read 2 is not trimmed.
So let me see if I'm inferring correctly here from this issue thread... Dual barcodes in separate index files can be demuxed by concatenating the sequences in the 2 index files and then supply the barcodes in the barcode file as "ID\tBC1-BC2\n"?
index1-read1 --- read2-index2
sample1 index1-a index2-b sample2 index1-a index2-c sample3 index1-d index2-c