Closed jenzopr closed 6 years ago
Thanks for your message and for using ChimPipe
Have you made sure that the gene annotation file gencode.vM16.annotation.gtf did not contain any chromosome that is not present in the genome index GRCm38.p5.genome_whitelist.gem?
If this is not the case then I would need those two files as well as the genome in fasta format and the command you used to produce the genome index.
Thanks, Sarah
On Tue, Jun 5, 2018 at 8:57 AM, Jens Preußner notifications@github.com wrote:
Dear all, dear Sarah,
I'm running into an error using ChimPipe: The stage [CHIMSIM] fails with [ERROR] Error running ChimSim.
The full output looks like
../ChimPipe/ChimPipe.sh --fastq_1 ../raw/sample_Leg_Bulk_Tumor_R1.fastq.gz --fastq_2 ../raw/sample_Leg_Bulk_Tumor_R2.fastq.gz -g GRCm38.p5.genome_whitelist.gem -a /mnt/flatfiles/organisms/mouse/mm10_GRCm38/annotation/gencode/gencode.vM16.annotation.gtf -t gencode.vM16.annotation.gtf.junctions.gem -k gencode.vM16.annotation.gtf.junctions.keys --sample-id sample_Leg_Bulk --threads 16 --tmp-dir tmp
CHIMPIPE CONFIGURATION FOR sample_Leg_Bulk
ChimPipe Version v0.9.5
MANDATORY ARGUMENTS fastq_1: ../raw/sample_Leg_Bulk_Tumor_R1.fastq.gz fastq_2: ../raw/sample_Leg_Bulk_Tumor_R2.fastq.gz genome-index: GRCm38.p5.genome_whitelist.gem annotation: /mnt/flatfiles/organisms/mouse/mm10_GRCm38/annotation/gencode/gencode.vM16.annotation.gtf transcriptome-index: gencode.vM16.annotation.gtf.junctions.gem transcriptome-keys: gencode.vM16.annotation.gtf.junctions.keys sample-id: sample_Leg_Bulk
Reads information seq-library: UNKNOWN max-read-length: 150
MAPPING PHASE 1st mapping consensus-ss-fm: GT+AG,GC+AG,ATATC+A.,GTATC+AT min-split-size-fm: 15 refinement-step-size-fm (0:disabled): 2 stats: TRUE
2nd mapping consensus-ss-fm: GT+AG min-split-size-fm: 15 refinement-step-size-fm (0:disabled): 2
CHIMERA DETECTION PHASE Classification readthrough-max-dist: 100000
Filters total-support: 3 spanning-reads: 1 consistent-pairs: 1 total-support-novel-ss: 6 spanning-reads-novel-ss: 3 consistent-pairs-novel-ss: 3 perc-staggered (disabled:0): 0 perc-multimappings (disabled:100): 100 perc-inconsistent-pairs (disabled:100): 100 similarity: 30+90 biotype: pseudogene,polymorphic_pseudogene,IG_C_pseudogene,IG_J_pseudogene,IG_V_pseudogene,TR_J_pseudogene,TR_V_pseudogene
Files similarity-gene-pairs: NOT_PROVIDED
GENERAL output-dir: /data/exp-tumor/chimpipe tmp-dir: tmp threads: 16 log: warn cleanup: TRUE
Executing ChimPipe v0.9.5 for sample_Leg_Bulk
[PRELIM] Determining the offset quality of the reads for sample_Leg_Bulk... quality=
/data/exp-tumor/ChimPipe/src/bash/detect.fq.qual.sh ../raw/sample_Leg_Bulk_Tumor_R1.fastq.gz | awk '{print $2}'
The read quality is 33 done Tue Jun 5 08:51:18 CEST 2018 First mapping BAM file already exists... skipping first mapping step Tue Jun 5 08:51:18 CEST 2018 FASTQ file with reads to remap already exists... skipping extracting reads to remap step Tue Jun 5 08:51:18 CEST 2018 Second mapping GEM file already exists... skipping extracting second mapping step Tue Jun 5 08:51:18 CEST 2018 Executing infer library type step [INFER-LIBRARY] Infering the sequencing library protocol from a random subset with 1 percent of the mapped reads...done [INFER-LIBRARY] Fraction of reads explained by 1++,1--,2+-,2-+: 50.0484 [INFER-LIBRARY] Fraction of reads explained by 1+-,1-+,2++,2--: 49.9516 [INFER-LIBRARY] Fraction of reads explained by other combinations: 0 [INFER-LIBRARY] Sequencing library type: UNSTRANDED [INFER-LIBRARY] Strand aware protocol (1: yes, 0: no): 0 Tue Jun 5 08:52:09 CEST 2018 Sequencing library inference for sample_Leg_Bulk completed in 0.85 min Tue Jun 5 08:52:09 CEST 2018 Chimeric Junctions file already exists... skipping step Tue Jun 5 08:52:09 CEST 2018 Discordant paired-end file already exists... skipping step Tue Jun 5 08:52:09 CEST 2018 ChimIntegrate output file already exists... skipping step Tue Jun 5 08:52:09 CEST 2018 Executing ChimSimilarity [CHIMSIM] Computing similarity between annotated genes... /data/exp-tumor/ChimPipe/src/bash/similarity_bt_gnpairs.sh /mnt/flatfiles/organisms/mouse/mm10_GRCm38/annotation/gencode/gencode.vM16.annotation.gtf GRCm38.p5.genome_whitelist.gem 1> /data/exp-tumor/chimpipe/GnSimilarity/sim.out 2> /data/exp-tumor/chimpipe/GnSimilarity/sim.err [ERROR] Error running ChimSimThe files GnSimilarity/sim.out and GnSimilarity/sim.err contain
Usage: similarity_bt_gnpairs.sh annot genome_GEM
Example: similarity_bt_gnpairs.sh gen10.long.exon.gtf hg19.gem Takes an annotation in gtf or gff2 format (with exons rows identified by gene_id and then transcript_id as first keys in 9th field), the gem index of the corresponding genome and computes the similarity between each gene pair of the annotation, as the maximum similarity of their transcript pairs. Note: it is important the annotation does not include chromosomes that are not part of the genome
exit 0
and
ERROR:Please specify a valid genome gem index file
Best and thanks for a quick hint, Jens
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Chimera-tools/ChimPipe/issues/9, or mute the thread https://github.com/notifications/unsubscribe-auth/ACa6AX4MZAtOUQ4CtPKTbBNp5svW4yiQks5t5ivTgaJpZM4UaQLo .
--
Sarah Djebali Quelen - PhD INRA GenPhySE, ch. de Borderouge 31326 Castanet-Tolosan, France Tel. +33 5 61 28 51 22 sarah.djebali-quelen at inra dot fr
Hi Jens, I would like to add to Sarah´s comment that I guess it is a problem with paths. Can you also please rerun ChimPipe specifying the full path to all the input files?
If it does not work I would suggest to try run this step separately and then use the generated matrix as input for chimpipe. It is explained at "Gene pair similarity file (Optional)" in the documentation (https://chimpipe.readthedocs.io/en/latest/manual.html#execute-chimpipe)
Best, Bernardo
Hi Sarah and Bernardo,
I made sure that the gene annotation file and the genome index file contained the same set of chromosomes. When executing
/data/exp-tumor/ChimPipe/src/bash/similarity_bt_gnpairs.sh /mnt/flatfiles/organisms/mouse/mm10_GRCm38/annotation/gencode/gencode.vM16.annotation.gtf GRCm38.p5.genome_whitelist.gem
seperately, the program finishes without errors but doesn't write the $simGnPairs
file into the GnSimilarity
folder:
I am extracting the cdna sequence of each transcript in the annotation
I am making the list of distinct exon coordinates
done
I am retrieving the exon sequences
Tue Jun 5 09:37:38 2018 -- Loading index (likely to take long)... done.
Tue Jun 5 09:37:41 2018 -- Inverting locations... done.
done
I am making a file that both has the exon coordinates and sequence
done
For each transcript I am making a list of exon coordinates from 5' to 3'
done
For each transcript I am making its sequence by concatenating the sequences of its exons from 5' to 3'
done
I am cleaning
done
done
I am making a BLAST database out of the transcript sequences
Building a new DB, current time: 06/05/2018 09:37:57
New DB name: /mnt/data/exp-tumor/chimpipe/gencode.vM16.annotation_tr.fasta
New DB title: gencode.vM16.annotation_tr.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 77282 sequences in 3.25385 seconds.
done
I am running Blast on all against all to detect local similarity between transcripts
done
I am making a gene pair file with % similarity, alignment length and other information
done
I am cleaning
done
However, the $simGnPairs
file is present in the working directory. I will use it via --similarity-gene-pairs
as input now.
Thanks for your help!
Summary: similarity_bt_gnpairs.sh
by default doesn't write into the GnSimilarity
folder, but the working directory. Taking the resulting gencode.vM16.annotation.similarity.txt
file as input to ChimPipe via --similarity-gene-pairs
works just fine and the pipeline finishes without further errors.
Thanks again for your help and quick replies!
Hi, Glad to hear it´s working now. Yes, the script writes the output in the working directory.
Please, to avoid any path related issue always write the full path to input files when running chimpipe.
Best
Dear all, dear Sarah,
I'm running into an error using ChimPipe: The stage
[CHIMSIM]
fails with[ERROR] Error running ChimSim
.The full output looks like
The files
GnSimilarity/sim.out
andGnSimilarity/sim.err
containand
Best and thanks for a quick hint, Jens