ZarnackGroup / racoon_clip

racoon_clip processes your iCLIP and eCLIP data from raw files to single-nucleotide crosslinks in a single step.
https://racoon-clip.readthedocs.io/en/latest/index.html
1 stars 0 forks source link

file path error #7

Open CrystalShann opened 1 week ago

CrystalShann commented 1 week ago

I got the error /projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip False Building DAG of jobs... MissingInputException in rule fastqc_raw_multi in file /projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow/Snakefile, line 1074: Missing input files for rule fastqc_raw_multi: output: /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/racoon_clip/results/tmp/.fastqc.R1_ENCFF647ULQ.raw.chkpnt wildcards: wdir=/projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/racoon_clip, sample=R1_ENCFF647ULQ affected files: /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep1_ENCLB105HGF/R1_ENCFF647ULQ.fastq

when trying to use the racoon_clip pipeline. But the file is in another directory as specified in the config file

Here is my config file and experiment group file for reference. Thanks! `wdir: "/projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/racoon_clip" # Output directory

Input files infiles: "/projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep1_ENCLB105HGF/R1_ENCFF319MIE.fastq /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep2_ENCLB498TQP/R1_ENCFF647ULQ.fastq /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep2_ENCLB498TQP/R2_ENCFF111LIY.fastq /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep1_ENCLB105HGF/R2_ENCFF217LJP.fastq" samples: "R1_ENCFF319MIE R2_ENCFF217LJP R1_ENCFF647ULQ R2_ENCFF111LIY"

experiment_groups: "rep1_ENCLB105HGF rep1_ENCLB105HGF rep2_ENCLB498TQP rep2_ENCLB498TQP" experiment_group_file: "/projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/racoon_clip/experiment_design.txt" # Path to experiment design file seq_format: "-Q33" # Illumina format

Barcodes barcodeLength: 0 # No barcode in the reads minBaseQuality: 10 umi1_len: 5 # UMI length is 5nt umi2_len: 0 exp_barcode_len: 0 encode: False

Experiment type experiment_type: "eCLIP_5ntUMI" # Set for ENCODE eCLIP with 5nt UMI

Barcodes file barcodes_fasta: "" # Not needed as it's already demultiplexed quality_filter_barcodes: True

Demultiplexing demultiplex: False # No demultiplexing necessary min_read_length: 15

Adapter trimming adapter_file: "/projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow/params.dir/adapter_R2.fa" # Path to your adapter file adapter_cycles: 2 # Two cycles of adapter trimming for eCLIP adapter_trimming: True

STAR alignment gtf: "/projects/marralab/cshan_prj/clip-seq/genomes/human/gencode.v46.primary_assembly.annotation.gtf" # Path to GTF file genome_fasta: "/projects/marralab/cshan_prj/clip-seq/genomes/human/GRCh38.primary_assembly.genome.fa" # Path to FASTA file read_length: 45 # Updated read length outFilterMismatchNoverReadLmax: 0.04 outFilterMismatchNmax: 999 outFilterMultimapNmax: 1 outReadsUnmapped: "Fastx" outSJfilterReads: "Unique" moreSTARParameters: ""

Deduplication deduplicate: True # Perform deduplication as UMIs are present`

experiment group rep1_ENCLB105HGF R1_ENCFF319MIE rep1_ENCLB105HGF R2_ENCFF217LJP rep2_ENCLB498TQP R1_ENCFF647ULQ rep2_ENCLB498TQP R2_ENCFF111LIY

MelinaKlostermann commented 1 week ago

Hi, I am on a longer holiday now. So I can only look at your issue in more detail in the middle of october.

However, I noticed that you use both eCLIP reads and try to combine them with the experiment groups. Please note that this will lead to false crosslinks detected from R1 (in case you used a standard eCLIP protocol) and I would strongly advise to only use the R2 from eClip data. With the Standard eCLip protocol, R2 preserves the exaxt crosslink position but R1 does not. However, I guess the path problem is something else. Could you send me the commandline that you use to start racoon_clip and the racoon_clip version number that you are using?

MelinaKlostermann commented 1 week ago

A quick thing that you could test, would be to delete the hole results folder in the wdir location and then try to rerun, sometimes that solves the fastqc problems.

MelinaKlostermann commented 1 week ago

I am also guessing from your file names that this data is from the encode data base (but maybe I am guessing wrong). If it is from Encode you should use the Experiment type that contains "encode". And also set the encode parameter to true, otherwise you will get problems in the deduplication later.

CrystalShann commented 1 week ago

Hi, thank you so much are the suggestions. Yes, this is ENCODE data. However, I don't have a results folder in the wdir location, the pipeline failed to run from the beginning because of the file path issue. I am using racoon_clip, version v1.1.3, and the command I used was racoon_clip run --configfile eclip_test.yaml --cores 10

here is the log `racoon_clip run --configfile eclip_test.yaml --cores 10 [2024:09:19 10:02:09] commandline values {'log': 'racoon_clip.log', 'snakebase': '/projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow', 'samples': '', 'barcodeLength': 0, 'umi1_len': 0, 'umi2_len': 0, 'experimental_barcode_len': 0, 'barcodes_fasta': None, 'quality_filter_barcodes': 'True', 'demultiplex': 'False', 'adapter_file': None, 'adapter_trimming': 'True', 'gtf': None, 'genome_fasta': None, 'deduplicate': 'True'} [2024:09:19 10:02:09] default values {'wdir': './racoon_clip_out', 'infiles': '', 'experiment_groups': '', 'experiment_group_file': '', 'seq_format': '-Q33', 'barcodeLength': '', 'minBaseQuality': 10, 'umi1_len': '', 'umi2_len': '', 'experimental_barcode_len': '', 'encode': 'False', 'encode_umi_length': 10, 'experiment_type': 'other', 'barcodes_fasta': '', 'quality_filter_barcodes': True, 'demultiplex': False, 'adapter_file': '/projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow/params.dir/adapter.fa', 'min_read_length': 15, 'adapter_cycles': 1, 'adapter_trimming': True, 'gtf': '', 'genome_fasta': '', 'read_length': 150, 'outFilterMismatchNoverReadLmax': 0.04, 'outFilterMismatchNmax': 999, 'outFilterMultimapNmax': 1, 'outReadsUnmapped': 'Fastx', 'outSJfilterReads': 'Unique', 'moreSTARParameters': '', 'deduplicate': True, 'mir_genome_fasta': '', 'mir_starts_allowed': '1 2 3 4'} [2024:09:19 10:02:09] Updating config file with commandline values {'wdir': '/projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/racoon_clip', 'infiles': '/projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep1_ENCLB105HGF/R1_ENCFF319MIE.fastq /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep2_ENCLB498TQP/R1_ENCFF647ULQ.fastq /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep2_ENCLB498TQP/R2_ENCFF111LIY.fastq /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep1_ENCLB105HGF/R2_ENCFF217LJP.fastq', 'samples': 'R1_ENCFF319MIE R2_ENCFF217LJP R1_ENCFF647ULQ R2_ENCFF111LIY', 'seq_format': '-Q33', 'barcodeLength': 0, 'minBaseQuality': 10, 'umi1_len': 5, 'umi2_len': 0, 'exp_barcode_len': 0, 'encode': False, 'experiment_type': 'eCLIP_5ntUMI', 'barcodes_fasta': '', 'quality_filter_barcodes': True, 'demultiplex': False, 'min_read_length': 15, 'adapter_file': '/projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow/params.dir/adapter_R2.fa', 'adapter_cycles': 2, 'adapter_trimming': True, 'gtf': '/projects/marralab/cshan_prj/clip-seq/genomes/human/gencode.v46.primary_assembly.annotation.gtf', 'genome_fasta': '/projects/marralab/cshan_prj/clip-seq/genomes/human/GRCh38.primary_assembly.genome.fa', 'read_length': 45, 'outFilterMismatchNoverReadLmax': 0.04, 'outFilterMismatchNmax': 999, 'outFilterMultimapNmax': 1, 'outReadsUnmapped': 'Fastx', 'outSJfilterReads': 'Unique', 'moreSTARParameters': '', 'deduplicate': True, 'log': 'racoon_clip.log', 'snakebase': '/projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow', 'experimental_barcode_len': 0, 'experiment_groups': '', 'experiment_group_file': '', 'encode_umi_length': 10, 'mir_genome_fasta': '', 'mir_starts_allowed': '1 2 3 4'} [2024:09:19 10:02:09] Writing config file to eclip_test_updated.yaml [2024:09:19 10:02:09] --------------------- [2024:09:19 10:02:09] | Snakemake command | [2024:09:19 10:02:09] ---------------------

snakemake -s /projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow/Snakefile --configfile eclip_test_updated.yaml --use-conda --conda-frontend mamba --jobs 1 --rerun-incomplete --printshellcmds --nolock --show-failed-logs --cores 10 string

6.0.2 1.0 json

/projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip False Building DAG of jobs... MissingInputException in rule fastqc_raw_multi in file /projects/marralab/cshan_prj/clip-seq/racoon_clip-1.1.3/racoon_clip/workflow/Snakefile, line 1074: Missing input files for rule fastqc_raw_multi: output: /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/racoon_clip/results/tmp/.fastqc.R1_ENCFF647ULQ.raw.chkpnt wildcards: wdir=/projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/racoon_clip, sample=R1_ENCFF647ULQ affected files: /projects/marralab/cshan_prj/clip-seq/test_data/ENCSR331VNX/rep1_ENCLB105HGF/R1_ENCFF647ULQ.fastq [2024:09:19 10:02:11] ERROR: Snakemake failed `