Open hussius opened 12 years ago
@costeapaul, can you give me some pointers to this script @hussius is referring to ?
For doing PhiX removal on paired-end reads (which will almost always be the case) rather than single-end, an example command would be
bowtie ---solexa1.3-quals --un sample_nophiX.fastq
/bubo/nobackup/uppnex/reference/biodata/genomes/phiX174/phix/bowtie/phix -1 sample_1.fastq -2 sample_2.fastq /dev/null Note! When doing it on already demultiplexed samples (which will have Illumina index 3), this is fine, but if done on the non-demultiplexed, whole-lane FASTQ files, where the intention is to deliver just those whole-lane files, the last 7 nucleotides (the barcode) should be removed before running this command. This can be done by writing a trivial script to remove 7 bases off the end of the sequence and quality lines in the FASTQ files.
I'm thinking on adding a field on run_info.yaml "filter_out_gnomes" on a per-sample basis such as:
multiplex: - barcode_id: '1' barcode_type: Illumina name: BAC11 sequence: ATCACG filter_out_genomes: ecoli, phix
That way the get rid of the second scenario (whole lane), and we apply it to the samples that matter. @chapmanb, we spike on phiX on sample 3, that's why this feature is needed, but I think in this way it's better generalized. In addition, it can be eventually exposed as an additional field in the ngLIMS.
Roman; That's a nice idea. Thanks for looking at this; let me know how I can help
"filter_out_gnomes" - was that typo inspired by the approaching Christmas mood? :-)
LOL X"D
Well, you never know what you might find in those magical FastQ files ;P
hussius reply@reply.github.com wrote:
"filter_out_gnomes" - was that typo inspired by the approaching Christmas mood? :-)
Reply to this email directly or view it on GitHub: https://github.com/brainstorm/bcbb/issues/56#issuecomment-2868049
A better option for mate pair linker removal would be to use a modified version of Deloxer (http://genomes.sdsc.edu/downloads/deloxer/). The modified version is written by Ino DeBruijn and handles more cases than the original Deloxer. I'll ask him to put it on GitHub.
From Brad:
I'm actively trying to move into using other external programs upstream of bcbio-nextgen instead of coding
this directly.
Need to include Paul's script for mate-pair linker removal - ONLY for mate-pair runs. Perhaps as part of general screening module together with PhiX, contamination + adapter screeners.