aryeelab / guideseq

Analysis pipeline for the GUIDE-seq assay.
GNU Affero General Public License v3.0
75 stars 53 forks source link

demultiplex failure #49

Open Salvobioinfo opened 5 years ago

Salvobioinfo commented 5 years ago

Running command line: python guideseq.py all -m manifest_1.yaml after demultiplexing begins I receive: IOError: [Errno 24] Too many open files: ...

Upon inspection of the demultiplexed output directory, there is one file for every barcode. I saw other issue like mine, but others solutions don't work for me. My manifest file is:

reference_genome: /solid/Reference/GRCh38/GRCh38.d1.vd1.fa
output_folder: /tank/home/salvo/KATIA/OUTPUT

bwa: /solid/Programs/bwa-0.7.17/bwa
bedtools: /solid/Programs/bedtools2/bin/bedtools

demultiplex_min_reads: 1000

undemultiplexed:
    forward: /tank/USB/Undetermined_S0_L001_R1_001.fastq.gz
    reverse: /tank/USB/Undetermined_S0_L001_R2_001.fastq.gz
    index1: /tank/USB/Undetermined_S0_L001_I1_001.fastq.gz
    index2: /tank/USB/Undetermined_S0_L001_I2_001.fastq.gz

samples:
    control_a
        target:  
        barcode1: TTCTGCCT
        barcode2: CTCTCTAT
        description: Hek-MECP2_CTRLpos

    Hek-MECP2pos:
        target: GATTTTGACTTCACGGTAACTGG
        barcode1: TCGCCTTA
        barcode2: TAGATCGC
        description: Hek-MECP2pos

    control_b:
        target:  
        barcode1: GCTCAGGA
        barcode2: CTCTCTAT
        description: Hek-Hek-MECP2_CTRLneg

    Hek-MECP2neg:
        target: GATTTTGACTTCACGGTAACTGG
        barcode1: CTAGTACG
        barcode2: TAGATCGC
        description: Hek-MECP2neg

Any idea what is going on here? Salvatore

Salvobioinfo commented 5 years ago

I modified manifest file in:

reference_genome: /solid/Reference/GRCh38/GRCh38.d1.vd1.fa
output_folder: /tank/home/salvo/KATIA/OUTPUT

bwa: bwa
bedtools: bedtools

demultiplex_min_reads: 100000

undemultiplexed:
    forward: /tank/USB/Undetermined_S0_L001_R1_001.fastq.gz
    reverse: /tank/USB/Undetermined_S0_L001_R2_001.fastq.gz
    index1: /tank/USB/Undetermined_S0_L001_I1_001.fastq.gz
    index2: /tank/USB/Undetermined_S0_L001_I2_001.fastq.gz

samples:
    control:
        target:  
        barcode1: GCTCAGGA
        barcode2: CTCTCTAT
        description: Control

    HekMECP2neg:
        target: GATTTTGACTTCACGGTAACTGG
        barcode1: CTAGTACG
        barcode2: TAGATCGC
        description: Hek-MECP2ne

But I obtained always same results.

staciawyman commented 5 years ago

If you increase the "demultiplex_min_reads" even more? I set mine to 50000.

Zethson commented 5 years ago

@martinaryee @vedtopkar It's unfortunately clearly broken. It would be highly appreciated if you could take a look at the demultiplexing step again.

I have the same errors as @Salvobioinfo .

Salvobioinfo commented 5 years ago

@Zethson In the first our attempt the issue was due to the sequencing problems, Indeed I tried to demultiplex our data with different tools and only one sample has a good reads amount. Did you try it ? To validate if these problem are linked with our seqeuncing step, I ran guideseq with good samples obtained from other people, and tool worked well.