Open CrystalShann opened 1 week ago
Hi, I am on a longer holiday now. So I can only look at your issue in more detail in the middle of october.
However, I noticed that you use both eCLIP reads and try to combine them with the experiment groups. Please note that this will lead to false crosslinks detected from R1 (in case you used a standard eCLIP protocol) and I would strongly advise to only use the R2 from eClip data. With the Standard eCLip protocol, R2 preserves the exaxt crosslink position but R1 does not. However, I guess the path problem is something else. Could you send me the commandline that you use to start racoon_clip and the racoon_clip version number that you are using?
A quick thing that you could test, would be to delete the hole results folder in the wdir location and then try to rerun, sometimes that solves the fastqc problems.
I am also guessing from your file names that this data is from the encode data base (but maybe I am guessing wrong). If it is from Encode you should use the Experiment type that contains "encode". And also set the encode parameter to true, otherwise you will get problems in the deduplication later.
Hi, thank you so much are the suggestions. Yes, this is ENCODE data. However, I don't have a results folder in the wdir location, the pipeline failed to run from the beginning because of the file path issue. I am using racoon_clip, version v1.1.3, and the command I used was racoon_clip run --configfile eclip_test.yaml --cores 10
here is the log `racoon_clip run --configfile eclip_test.yaml --cores 10 [2024:09:19 10:02:09] commandline values {'log': 'racoon_clip.log', 'snakebase': '/projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow', 'samples': '', 'barcodeLength': 0, 'umi1_len': 0, 'umi2_len': 0, 'experimental_barcode_len': 0, 'barcodes_fasta': None, 'quality_filter_barcodes': 'True', 'demultiplex': 'False', 'adapter_file': None, 'adapter_trimming': 'True', 'gtf': None, 'genome_fasta': None, 'deduplicate': 'True'} [2024:09:19 10:02:09] default values {'wdir': './racoon_clip_out', 'infiles': '', 'experiment_groups': '', 'experiment_group_file': '', 'seq_format': '-Q33', 'barcodeLength': '', 'minBaseQuality': 10, 'umi1_len': '', 'umi2_len': '', 'experimental_barcode_len': '', 'encode': 'False', 'encode_umi_length': 10, 'experiment_type': 'other', 'barcodes_fasta': '', 'quality_filter_barcodes': True, 'demultiplex': False, 'adapter_file': '/projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow/params.dir/adapter.fa', 'min_read_length': 15, 'adapter_cycles': 1, 'adapter_trimming': True, 'gtf': '', 'genome_fasta': '', 'read_length': 150, 'outFilterMismatchNoverReadLmax': 0.04, 'outFilterMismatchNmax': 999, 'outFilterMultimapNmax': 1, 'outReadsUnmapped': 'Fastx', 'outSJfilterReads': 'Unique', 'moreSTARParameters': '', 'deduplicate': True, 'mir_genome_fasta': '', 'mir_starts_allowed': '1 2 3 4'} [2024:09:19 10:02:09] Updating config file with commandline values {'wdir': '/projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/racoon_clip', 'infiles': '/projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep1_ENCLB105HGF/R1_ENCFF319MIE.fastq /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep2_ENCLB498TQP/R1_ENCFF647ULQ.fastq /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep2_ENCLB498TQP/R2_ENCFF111LIY.fastq /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep1_ENCLB105HGF/R2_ENCFF217LJP.fastq', 'samples': 'R1_ENCFF319MIE R2_ENCFF217LJP R1_ENCFF647ULQ R2_ENCFF111LIY', 'seq_format': '-Q33', 'barcodeLength': 0, 'minBaseQuality': 10, 'umi1_len': 5, 'umi2_len': 0, 'exp_barcode_len': 0, 'encode': False, 'experiment_type': 'eCLIP_5ntUMI', 'barcodes_fasta': '', 'quality_filter_barcodes': True, 'demultiplex': False, 'min_read_length': 15, 'adapter_file': '/projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow/params.dir/adapter_R2.fa', 'adapter_cycles': 2, 'adapter_trimming': True, 'gtf': '/projects/marralab/cshan_prj/clip-seq/genomes/human/gencode.v46.primary_assembly.annotation.gtf', 'genome_fasta': '/projects/marralab/cshan_prj/clip-seq/genomes/human/GRCh38.primary_assembly.genome.fa', 'read_length': 45, 'outFilterMismatchNoverReadLmax': 0.04, 'outFilterMismatchNmax': 999, 'outFilterMultimapNmax': 1, 'outReadsUnmapped': 'Fastx', 'outSJfilterReads': 'Unique', 'moreSTARParameters': '', 'deduplicate': True, 'log': 'racoon_clip.log', 'snakebase': '/projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow', 'experimental_barcode_len': 0, 'experiment_groups': '', 'experiment_group_file': '', 'encode_umi_length': 10, 'mir_genome_fasta': '', 'mir_starts_allowed': '1 2 3 4'} [2024:09:19 10:02:09] Writing config file to eclip_test_updated.yaml [2024:09:19 10:02:09] --------------------- [2024:09:19 10:02:09] | Snakemake command | [2024:09:19 10:02:09] ---------------------
snakemake -s /projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow/Snakefile --configfile eclip_test_updated.yaml --use-conda --conda-frontend mamba --jobs 1 --rerun-incomplete --printshellcmds --nolock --show-failed-logs --cores 10 string
6.0.2 1.0 json
/projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip False Building DAG of jobs... MissingInputException in rule fastqc_raw_multi in file /projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow/Snakefile, line 1074: Missing input files for rule fastqc_raw_multi: output: /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/racoon_clip/results/tmp/.fastqc.R1_ENCFF647ULQ.raw.chkpnt wildcards: wdir=/projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/racoon_clip, sample=R1_ENCFF647ULQ affected files: /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep1_ENCLB105HGF/R1_ENCFF647ULQ.fastq [2024:09:19 10:02:11] ERROR: Snakemake failed `
I got the error
/projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip False Building DAG of jobs... MissingInputException in rule fastqc_raw_multi in file /projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow/Snakefile, line 1074: Missing input files for rule fastqc_raw_multi: output: /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/racoon_clip/results/tmp/.fastqc.R1_ENCFF647ULQ.raw.chkpnt wildcards: wdir=/projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/racoon_clip, sample=R1_ENCFF647ULQ affected files: /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep1_ENCLB105HGF/R1_ENCFF647ULQ.fastq
when trying to use the racoon_clip pipeline. But the file is in another directory as specified in the config file
Here is my config file and experiment group file for reference. Thanks! `wdir: "/projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/racoon_clip" # Output directory
Input files infiles: "/projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep1_ENCLB105HGF/R1_ENCFF319MIE.fastq /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep2_ENCLB498TQP/R1_ENCFF647ULQ.fastq /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep2_ENCLB498TQP/R2_ENCFF111LIY.fastq /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep1_ENCLB105HGF/R2_ENCFF217LJP.fastq" samples: "R1_ENCFF319MIE R2_ENCFF217LJP R1_ENCFF647ULQ R2_ENCFF111LIY"
experiment_groups: "rep1_ENCLB105HGF rep1_ENCLB105HGF rep2_ENCLB498TQP rep2_ENCLB498TQP" experiment_group_file: "/projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/racoon_clip/experiment_design.txt" # Path to experiment design file seq_format: "-Q33" # Illumina format
Barcodes barcodeLength: 0 # No barcode in the reads minBaseQuality: 10 umi1_len: 5 # UMI length is 5nt umi2_len: 0 exp_barcode_len: 0 encode: False
Experiment type experiment_type: "eCLIP_5ntUMI" # Set for ENCODE eCLIP with 5nt UMI
Barcodes file barcodes_fasta: "" # Not needed as it's already demultiplexed quality_filter_barcodes: True
Demultiplexing demultiplex: False # No demultiplexing necessary min_read_length: 15
Adapter trimming adapter_file: "/projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow/params.dir/adapter_R2.fa" # Path to your adapter file adapter_cycles: 2 # Two cycles of adapter trimming for eCLIP adapter_trimming: True
STAR alignment gtf: "/projects/marralab/cshan_prj/clip-seq/genomes/human/gencode.v46.primary_assembly.annotation.gtf" # Path to GTF file genome_fasta: "/projects/marralab/cshan_prj/clip-seq/genomes/human/GRCh38.primary_assembly.genome.fa" # Path to FASTA file read_length: 45 # Updated read length outFilterMismatchNoverReadLmax: 0.04 outFilterMismatchNmax: 999 outFilterMultimapNmax: 1 outReadsUnmapped: "Fastx" outSJfilterReads: "Unique" moreSTARParameters: ""
Deduplication deduplicate: True # Perform deduplication as UMIs are present`
experiment group
rep1_ENCLB105HGF R1_ENCFF319MIE rep1_ENCLB105HGF R2_ENCFF217LJP rep2_ENCLB498TQP R1_ENCFF647ULQ rep2_ENCLB498TQP R2_ENCFF111LIY