SciLifeLab / bcbb

Useful bioinformatics code, primarily in Python and R
http://bcbio.wordpress.com
10 stars 11 forks source link

Mate pair linker removal #56

Open hussius opened 12 years ago

hussius commented 12 years ago

Need to include Paul's script for mate-pair linker removal - ONLY for mate-pair runs. Perhaps as part of general screening module together with PhiX, contamination + adapter screeners.

brainstorm commented 12 years ago

@costeapaul, can you give me some pointers to this script @hussius is referring to ?

brainstorm commented 12 years ago

For doing PhiX removal on paired-end reads (which will almost always be the case) rather than single-end, an example command would be

bowtie ---solexa1.3-quals --un sample_nophiX.fastq 

/bubo/nobackup/uppnex/reference/biodata/genomes/phiX174/phix/bowtie/phix -1 sample_1.fastq -2 sample_2.fastq /dev/null Note! When doing it on already demultiplexed samples (which will have Illumina index 3), this is fine, but if done on the non-demultiplexed, whole-lane FASTQ files, where the intention is to deliver just those whole-lane files, the last 7 nucleotides (the barcode) should be removed before running this command. This can be done by writing a trivial script to remove 7 bases off the end of the sequence and quality lines in the FASTQ files.

brainstorm commented 12 years ago

I'm thinking on adding a field on run_info.yaml "filter_out_gnomes" on a per-sample basis such as:

  multiplex:
  - barcode_id: '1'
    barcode_type: Illumina
    name: BAC11
    sequence: ATCACG
    filter_out_genomes: ecoli, phix

That way the get rid of the second scenario (whole lane), and we apply it to the samples that matter. @chapmanb, we spike on phiX on sample 3, that's why this feature is needed, but I think in this way it's better generalized. In addition, it can be eventually exposed as an additional field in the ngLIMS.

chapmanb commented 12 years ago

Roman; That's a nice idea. Thanks for looking at this; let me know how I can help

hussius commented 12 years ago

"filter_out_gnomes" - was that typo inspired by the approaching Christmas mood? :-)

brainstorm commented 12 years ago

LOL X"D

Well, you never know what you might find in those magical FastQ files ;P

hussius reply@reply.github.com wrote:

"filter_out_gnomes" - was that typo inspired by the approaching Christmas mood? :-)


Reply to this email directly or view it on GitHub: https://github.com/brainstorm/bcbb/issues/56#issuecomment-2868049

hussius commented 12 years ago

A better option for mate pair linker removal would be to use a modified version of Deloxer (http://genomes.sdsc.edu/downloads/deloxer/). The modified version is written by Ino DeBruijn and handles more cases than the original Deloxer. I'll ask him to put it on GitHub.

brainstorm commented 11 years ago

From Brad:

I'm actively trying to move into using other external programs upstream of bcbio-nextgen instead of coding
this directly.