Error: One sample with pair-end reads as an input

cissizhang commented 1 year ago

Hi, I have ONE sample with pair-end reads. But I got the error when I run the poolq command line. Could you help me to figure it out?

The command line: poolq3.sh --row-reference row-ref.txt --col-reference col-ref.txt --row-reads R1_10K.fastq.gz --rev-row-reads R2_10K.fastq.gz --col-barcode-policy FIXED:0 --row-barcode-policy PREFIX:TCTTGTGGAA:20 --rev-row-barcode-policy PREFIX:TTGTCGAC:20

The error msg is "Row barcode file specified but no column barcodes file specified". But I don't have sample/condition barcode reads. The col-ref.txt is the dummy file as suggested in "Using PoolQ on Demultiplexed Data" section.
If I have pair-end reads, should I put reverse complementary barcode in the row-reference file? For example, something like: ATCG;TAGC,barcode1 TCAG;AGTC,barcode2

Thank you, Quan

mtomko commented 1 year ago

To answer your first question, those instructions were not written with paired-end sequencing in mind. The problem is that in that mode, you're "tricking" PoolQ into reading the first base of the reads file as a dummy barcode. In those instructions, the command-line parameters given are for the non-paired-end case, where PoolQ reads all its data out of a single file, so it can just use the first base of that file as the dummy barcode.

We currently have support in PoolQ for reading files in any of three modes:

All the sequencing data in 1 file
1 file containing index reads, 1 file containing the rest of the reads
1 file containing index reads, 1 file containing the rest of the forward read, 1 file containing the reverse read

Your case isn't really handled directly, where the index read comes from the forward (or reverse) read file. The short term solution is to pass one of your reads files twice - once as the forward read, and once as the index read:

Snippet:

--col-reads R1_10K.fastq.gz \
--row-reads R1_10K.fastq.gz \
--rev-row-reads R2_10K.fastq.gz \

The flaw here is that it'll read that first file twice (in parallel) so it's not as efficient as it could be. But then again, PoolQ wasn't really written with the demultiplexed case in mind and it's not a priority for us.

mtomko commented 1 year ago

To answer your second question, the reference file should include the barcodes in the orientation that they will be found in the sequencing (FASTQ) files. PoolQ will not reverse complement them, it will simply extract the data in the orientation it is given, and match it to the reference data it was given.

broadinstitute / poolq

Error: One sample with pair-end reads as an input #21