abishara / athena_meta

read cloud assembler
MIT License
34 stars 8 forks source link

NOT in barcode sorted order #14

Closed nick-youngblut closed 5 years ago

nick-youngblut commented 5 years ago

I'm running athena_meta 1.1, and I got the following error:

AssertionError: fastq R1-2.fq NOT in barcode sorted order. Ensure reads that share barcodes are in a block together

The README.rst states that the reads have to be interleaved, but I don't see anything in the documentation about having to sort reads by barcodes. Given that this is a requirement for athena_meta, do you have a helper script for sorting the reads by barcode?

abishara commented 5 years ago

Hi,

You are correct that this requirement is not currently in the README and this will be added.

The longranger basic software, at least the latest versions, actually do output interleaved fastqs in barcode sorted order (that is all reads assigned a particular barcode are in a single contiguous block).

Are you generating your input reads through some means other than the 10X genomics pipeline?

Thanks! alex

nick-youngblut commented 5 years ago

Yes, we are generating our libraries through a custom pipeline. 10X genomics takes too much time, money, etc.

abishara commented 5 years ago

Meaning the reads are not generated through their 10x machines either? Sounds great!

I don't have a script at the moment, but could look into adding something like that once I take care of all other outstanding issues. Have you produced one in the meantime? I used a solution like the one posted here https://www.biostars.org/p/15011/ in the past , which uses the unix paste+sort, but that was with a temporary hack to prepend the barcodes to the query names so that they ended up in barcode-sorted order as well.

Best, alex

abishara commented 5 years ago

Updated the README in commit be4923364853 to specify input fastq must be in barcode-sorted order.

nick-youngblut commented 5 years ago

Sorry for the slow reply. To sort the reads by barcode, I moved the barcode to the front of the sequence header, sorted with fastq-sort and then flipped the barcode back to the end. An example:

cat read1.fq read2.fq | gunzip -c | perl -pe "s/\@(.+) (.+)/\@\$2 \$1/" > TMP.fq && fastq-sort --id TMP.fq | perl -pe "s/\@(.+) (.+)/\@\$2 \$1-1/" > sorted.fq && rm -f TMP.fq
davidvilanova commented 5 years ago

DO you have the exact command line working with an interleaved fastq file output from longranger basic ?

abishara commented 5 years ago

@davidvilanova can you clarify what you mean? You should be able to provide a config.json file as described in the README if your outputs are produced from longranger. What issues are you having?