Chimera-tools / ChimPipe

ChimPipe: Accurate detection of fusion genes and transcription-induced chimeras from RNA-seq data
https://chimpipe.readthedocs.org/
GNU General Public License v3.0
14 stars 11 forks source link

Error using ChimSim #9

Closed jenzopr closed 6 years ago

jenzopr commented 6 years ago

Dear all, dear Sarah,

I'm running into an error using ChimPipe: The stage [CHIMSIM] fails with [ERROR] Error running ChimSim.

The full output looks like

../ChimPipe/ChimPipe.sh --fastq_1 ../raw/sample_Leg_Bulk_Tumor_R1.fastq.gz --fastq_2 ../raw/sample_Leg_Bulk_Tumor_R2.fastq.gz -g GRCm38.p5.genome_whitelist.gem -a /mnt/flatfiles/organisms/mouse/mm10_GRCm38/annotation/gencode/gencode.vM16.annotation.gtf -t gencode.vM16.annotation.gtf.junctions.gem -k gencode.vM16.annotation.gtf.junctions.keys --sample-id sample_Leg_Bulk --threads 16 --tmp-dir tmp

CHIMPIPE CONFIGURATION FOR sample_Leg_Bulk
------------------------------------------

  ChimPipe Version v0.9.5            

  ***** MANDATORY ARGUMENTS *****    
  fastq_1:                           ../raw/sample_Leg_Bulk_Tumor_R1.fastq.gz
  fastq_2:                           ../raw/sample_Leg_Bulk_Tumor_R2.fastq.gz
  genome-index:                      GRCm38.p5.genome_whitelist.gem
  annotation:                        /mnt/flatfiles/organisms/mouse/mm10_GRCm38/annotation/gencode/gencode.vM16.annotation.gtf
  transcriptome-index:               gencode.vM16.annotation.gtf.junctions.gem
  transcriptome-keys:                gencode.vM16.annotation.gtf.junctions.keys
  sample-id:                         sample_Leg_Bulk

  ** Reads information **            
  seq-library:                       UNKNOWN
  max-read-length:                   150

  ***** MAPPING PHASE *****          
  ** 1st mapping **                  
  consensus-ss-fm:                   GT+AG,GC+AG,ATATC+A.,GTATC+AT
  min-split-size-fm:                 15
  refinement-step-size-fm (0:disabled): 2
  stats:                             TRUE

  ** 2nd mapping **                  
  consensus-ss-fm:                   GT+AG
  min-split-size-fm:                 15
  refinement-step-size-fm (0:disabled): 2

  ***** CHIMERA DETECTION PHASE ***** 
  ** Classification **               
  readthrough-max-dist:              100000

  ** Filters **                      
  total-support:                     3
  spanning-reads:                    1
  consistent-pairs:                  1
  total-support-novel-ss:            6
  spanning-reads-novel-ss:           3
  consistent-pairs-novel-ss:         3
  perc-staggered (disabled:0):       0
  perc-multimappings (disabled:100): 100
  perc-inconsistent-pairs (disabled:100): 100
  similarity:                        30+90
  biotype:                           pseudogene,polymorphic_pseudogene,IG_C_pseudogene,IG_J_pseudogene,IG_V_pseudogene,TR_J_pseudogene,TR_V_pseudogene

  ** Files **                        
  similarity-gene-pairs:             NOT_PROVIDED

  ***** GENERAL *****                
  output-dir:                        /data/exp-tumor/chimpipe
  tmp-dir:                           tmp
  threads:                           16
  log:                               warn
  cleanup:                           TRUE

Executing ChimPipe v0.9.5 for sample_Leg_Bulk
---------------------------------------------

[PRELIM] Determining the offset quality of the reads for sample_Leg_Bulk...
 quality=`/data/exp-tumor/ChimPipe/src/bash/detect.fq.qual.sh ../raw/sample_Leg_Bulk_Tumor_R1.fastq.gz | awk '{print $2}'`
 The read quality is 33
done
Tue Jun  5 08:51:18 CEST 2018 ***** First mapping BAM file already exists... skipping first mapping step *****
Tue Jun  5 08:51:18 CEST 2018 ***** FASTQ file with reads to remap already exists... skipping extracting reads to remap step *****
Tue Jun  5 08:51:18 CEST 2018 ***** Second mapping GEM file already exists... skipping extracting second mapping step *****
Tue Jun  5 08:51:18 CEST 2018 ***** Executing infer library type step *****
[INFER-LIBRARY] Infering the sequencing library protocol from a random subset with 1 percent of the mapped reads...done
[INFER-LIBRARY] Fraction of reads explained by 1++,1--,2+-,2-+: 50.0484
[INFER-LIBRARY] Fraction of reads explained by 1+-,1-+,2++,2--: 49.9516
[INFER-LIBRARY] Fraction of reads explained by other combinations: 0
[INFER-LIBRARY] Sequencing library type: UNSTRANDED
[INFER-LIBRARY] Strand aware protocol (1: yes, 0: no): 0
Tue Jun  5 08:52:09 CEST 2018 ***** Sequencing library inference for sample_Leg_Bulk completed in 0.85 min *****
Tue Jun  5 08:52:09 CEST 2018 ***** Chimeric Junctions file already exists... skipping step *****
Tue Jun  5 08:52:09 CEST 2018 ***** Discordant paired-end file already exists... skipping step *****
Tue Jun  5 08:52:09 CEST 2018 ***** ChimIntegrate output file already exists... skipping step *****
Tue Jun  5 08:52:09 CEST 2018 ***** Executing ChimSimilarity *****
[CHIMSIM] Computing similarity between annotated genes...
 /data/exp-tumor/ChimPipe/src/bash/similarity_bt_gnpairs.sh /mnt/flatfiles/organisms/mouse/mm10_GRCm38/annotation/gencode/gencode.vM16.annotation.gtf GRCm38.p5.genome_whitelist.gem 1> /data/exp-tumor/chimpipe/GnSimilarity/sim.out 2> /data/exp-tumor/chimpipe/GnSimilarity/sim.err
[ERROR] Error running ChimSim

The files GnSimilarity/sim.out and GnSimilarity/sim.err contain

    Usage:    similarity_bt_gnpairs.sh annot genome_GEM

    Example:  similarity_bt_gnpairs.sh gen10.long.exon.gtf hg19.gem

    Takes an annotation in gtf or gff2 format (with exons rows identified by gene_id and then transcript_id as first keys in 9th field),
    the gem index of the corresponding genome and computes the similarity between each gene pair of the annotation, as the maximum 
    similarity of their transcript pairs.
    Note: it is important the annotation does not include chromosomes that are not part of the genome
    exit 0

and

    ERROR:Please specify a valid genome gem index file

Best and thanks for a quick hint, Jens

sdjebali commented 6 years ago

Thanks for your message and for using ChimPipe

Have you made sure that the gene annotation file gencode.vM16.annotation.gtf did not contain any chromosome that is not present in the genome index GRCm38.p5.genome_whitelist.gem?

If this is not the case then I would need those two files as well as the genome in fasta format and the command you used to produce the genome index.

Thanks, Sarah

On Tue, Jun 5, 2018 at 8:57 AM, Jens Preußner notifications@github.com wrote:

Dear all, dear Sarah,

I'm running into an error using ChimPipe: The stage [CHIMSIM] fails with [ERROR] Error running ChimSim.

The full output looks like

../ChimPipe/ChimPipe.sh --fastq_1 ../raw/sample_Leg_Bulk_Tumor_R1.fastq.gz --fastq_2 ../raw/sample_Leg_Bulk_Tumor_R2.fastq.gz -g GRCm38.p5.genome_whitelist.gem -a /mnt/flatfiles/organisms/mouse/mm10_GRCm38/annotation/gencode/gencode.vM16.annotation.gtf -t gencode.vM16.annotation.gtf.junctions.gem -k gencode.vM16.annotation.gtf.junctions.keys --sample-id sample_Leg_Bulk --threads 16 --tmp-dir tmp

CHIMPIPE CONFIGURATION FOR sample_Leg_Bulk

ChimPipe Version v0.9.5

MANDATORY ARGUMENTS fastq_1: ../raw/sample_Leg_Bulk_Tumor_R1.fastq.gz fastq_2: ../raw/sample_Leg_Bulk_Tumor_R2.fastq.gz genome-index: GRCm38.p5.genome_whitelist.gem annotation: /mnt/flatfiles/organisms/mouse/mm10_GRCm38/annotation/gencode/gencode.vM16.annotation.gtf transcriptome-index: gencode.vM16.annotation.gtf.junctions.gem transcriptome-keys: gencode.vM16.annotation.gtf.junctions.keys sample-id: sample_Leg_Bulk

Reads information seq-library: UNKNOWN max-read-length: 150

MAPPING PHASE 1st mapping consensus-ss-fm: GT+AG,GC+AG,ATATC+A.,GTATC+AT min-split-size-fm: 15 refinement-step-size-fm (0:disabled): 2 stats: TRUE

2nd mapping consensus-ss-fm: GT+AG min-split-size-fm: 15 refinement-step-size-fm (0:disabled): 2

CHIMERA DETECTION PHASE Classification readthrough-max-dist: 100000

Filters total-support: 3 spanning-reads: 1 consistent-pairs: 1 total-support-novel-ss: 6 spanning-reads-novel-ss: 3 consistent-pairs-novel-ss: 3 perc-staggered (disabled:0): 0 perc-multimappings (disabled:100): 100 perc-inconsistent-pairs (disabled:100): 100 similarity: 30+90 biotype: pseudogene,polymorphic_pseudogene,IG_C_pseudogene,IG_J_pseudogene,IG_V_pseudogene,TR_J_pseudogene,TR_V_pseudogene

Files similarity-gene-pairs: NOT_PROVIDED

GENERAL output-dir: /data/exp-tumor/chimpipe tmp-dir: tmp threads: 16 log: warn cleanup: TRUE

Executing ChimPipe v0.9.5 for sample_Leg_Bulk

[PRELIM] Determining the offset quality of the reads for sample_Leg_Bulk... quality=/data/exp-tumor/ChimPipe/src/bash/detect.fq.qual.sh ../raw/sample_Leg_Bulk_Tumor_R1.fastq.gz | awk '{print $2}' The read quality is 33 done Tue Jun 5 08:51:18 CEST 2018 First mapping BAM file already exists... skipping first mapping step Tue Jun 5 08:51:18 CEST 2018 FASTQ file with reads to remap already exists... skipping extracting reads to remap step Tue Jun 5 08:51:18 CEST 2018 Second mapping GEM file already exists... skipping extracting second mapping step Tue Jun 5 08:51:18 CEST 2018 Executing infer library type step [INFER-LIBRARY] Infering the sequencing library protocol from a random subset with 1 percent of the mapped reads...done [INFER-LIBRARY] Fraction of reads explained by 1++,1--,2+-,2-+: 50.0484 [INFER-LIBRARY] Fraction of reads explained by 1+-,1-+,2++,2--: 49.9516 [INFER-LIBRARY] Fraction of reads explained by other combinations: 0 [INFER-LIBRARY] Sequencing library type: UNSTRANDED [INFER-LIBRARY] Strand aware protocol (1: yes, 0: no): 0 Tue Jun 5 08:52:09 CEST 2018 Sequencing library inference for sample_Leg_Bulk completed in 0.85 min Tue Jun 5 08:52:09 CEST 2018 Chimeric Junctions file already exists... skipping step Tue Jun 5 08:52:09 CEST 2018 Discordant paired-end file already exists... skipping step Tue Jun 5 08:52:09 CEST 2018 ChimIntegrate output file already exists... skipping step Tue Jun 5 08:52:09 CEST 2018 Executing ChimSimilarity [CHIMSIM] Computing similarity between annotated genes... /data/exp-tumor/ChimPipe/src/bash/similarity_bt_gnpairs.sh /mnt/flatfiles/organisms/mouse/mm10_GRCm38/annotation/gencode/gencode.vM16.annotation.gtf GRCm38.p5.genome_whitelist.gem 1> /data/exp-tumor/chimpipe/GnSimilarity/sim.out 2> /data/exp-tumor/chimpipe/GnSimilarity/sim.err [ERROR] Error running ChimSim

The files GnSimilarity/sim.out and GnSimilarity/sim.err contain

Usage: similarity_bt_gnpairs.sh annot genome_GEM

Example:  similarity_bt_gnpairs.sh gen10.long.exon.gtf hg19.gem

Takes an annotation in gtf or gff2 format (with exons rows identified by gene_id and then transcript_id as first keys in 9th field),
the gem index of the corresponding genome and computes the similarity between each gene pair of the annotation, as the maximum
similarity of their transcript pairs.
Note: it is important the annotation does not include chromosomes that are not part of the genome

exit 0

and

ERROR:Please specify a valid genome gem index file

Best and thanks for a quick hint, Jens

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Chimera-tools/ChimPipe/issues/9, or mute the thread https://github.com/notifications/unsubscribe-auth/ACa6AX4MZAtOUQ4CtPKTbBNp5svW4yiQks5t5ivTgaJpZM4UaQLo .

--


Sarah Djebali Quelen - PhD INRA GenPhySE, ch. de Borderouge 31326 Castanet-Tolosan, France Tel. +33 5 61 28 51 22 sarah.djebali-quelen at inra dot fr

brguez commented 6 years ago

Hi Jens, I would like to add to Sarah´s comment that I guess it is a problem with paths. Can you also please rerun ChimPipe specifying the full path to all the input files?

If it does not work I would suggest to try run this step separately and then use the generated matrix as input for chimpipe. It is explained at "Gene pair similarity file (Optional)" in the documentation (https://chimpipe.readthedocs.io/en/latest/manual.html#execute-chimpipe)

Best, Bernardo

jenzopr commented 6 years ago

Hi Sarah and Bernardo,

I made sure that the gene annotation file and the genome index file contained the same set of chromosomes. When executing

/data/exp-tumor/ChimPipe/src/bash/similarity_bt_gnpairs.sh /mnt/flatfiles/organisms/mouse/mm10_GRCm38/annotation/gencode/gencode.vM16.annotation.gtf GRCm38.p5.genome_whitelist.gem

seperately, the program finishes without errors but doesn't write the $simGnPairs file into the GnSimilarity folder:

I am extracting the cdna sequence of each transcript in the annotation
I am making the list of distinct exon coordinates
done
I am retrieving the exon sequences
Tue Jun  5 09:37:38 2018 -- Loading index (likely to take long)... done.
Tue Jun  5 09:37:41 2018 -- Inverting locations... done.
done
I am making a file that both has the exon coordinates and sequence
done
For each transcript I am making a list of exon coordinates from 5' to 3'
done
For each transcript I am making its sequence by concatenating the sequences of its exons from 5' to 3'
done
I am cleaning
done
done
I am making a BLAST database out of the transcript sequences

Building a new DB, current time: 06/05/2018 09:37:57
New DB name:   /mnt/data/exp-tumor/chimpipe/gencode.vM16.annotation_tr.fasta
New DB title:  gencode.vM16.annotation_tr.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 77282 sequences in 3.25385 seconds.
done
I am running Blast on all against all to detect local similarity between transcripts
done
I am making a gene pair file with % similarity, alignment length and other information
done
I am cleaning
done

However, the $simGnPairs file is present in the working directory. I will use it via --similarity-gene-pairs as input now. Thanks for your help!

jenzopr commented 6 years ago

Summary: similarity_bt_gnpairs.sh by default doesn't write into the GnSimilarity folder, but the working directory. Taking the resulting gencode.vM16.annotation.similarity.txt file as input to ChimPipe via --similarity-gene-pairs works just fine and the pipeline finishes without further errors. Thanks again for your help and quick replies!

brguez commented 6 years ago

Hi, Glad to hear it´s working now. Yes, the script writes the output in the working directory.

Please, to avoid any path related issue always write the full path to input files when running chimpipe.

Best