YeoLab / skipper

Skip the peaks and expose RNA-binding in CLIP data
Other
8 stars 3 forks source link

How to understand parameter INFORMATIVE_READ? #11

Closed FionaMoon closed 1 year ago

FionaMoon commented 1 year ago

Hello, skipper team! I try to use skipper for GSE177848 which is a pair-end eCLIP data. The annotation of INFORMATIVE_READ In Skipper_config.py shows:

Single-end: enter 1. Paired-end: enter read (1 or 2) corresponding to crosslink site

I don't understand which one to choose (1/2) and why. Can you explain this for me? Thank you so much!

augustboyle commented 1 year ago

Hello,Thanks for your interest.For paired end eCLIP data on the ENCODE Project website, the informative read is read 2, so please enter 2.Also note that all such data has been processed with Skipper and called site output is available on the corresponding FigShare page:Skipper RNA-protein interaction profilesfigshare.comBest,EvanCourtesy of my phoneOn Jul 2, 2023, at 10:46 PM, LY @.***> wrote: Hello, skipper team! I try to use skipper for GSE177848 which is a pair-end eCLIP data. The annotation of INFORMATIVE_READ In Skipper_config.py shows:

Single-end: enter 1. Paired-end: enter read (1 or 2) corresponding to crosslink site

I don't understand which one to choose (1/2) and why. Can you explain this for me? Thank you so much!

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

byee4 commented 1 year ago

Is this the figshare that should be linked?

https://figshare.com/articles/dataset/Skipper_RNA-protein_interaction_profiles/21206009?file=37612190

FionaMoon commented 1 year ago

Thank you for your answer!

byee4 commented 1 year ago

FYI I recently hit an issue with incomplete fastqs, and I think the issue stems from this rule that runs two processes. Since one of them is run in the background, I'm not sure what happens when the second process finishes before the first, but my guess is that the exit code ends up being 0 when it shouldn't be. Evan do you know what the behavior is?

rule copy_with_umi:
    input:
        fq_1 = lambda wildcards: replicate_label_to_fastq_1[wildcards.replicate_label],
        fq_2 = lambda wildcards: replicate_label_to_fastq_2[wildcards.replicate_label],
    output:
        fq_1 = temp("output/fastqs/copy/{replicate_label}-1.fastq.gz"), #SORT OUT!!
        fq_2 = temp("output/fastqs/copy/{replicate_label}-2.fastq.gz"), #SORT OUT!!        
    threads: 2
    params:
        run_time = "6:00:00",
        error_file = "stderr/{replicate_label}.copy_with_umi.err",
        out_file = "stdout/{replicate_label}.copy_with_umi.out",
        job_name = "copy_with_umi"
    benchmark: "benchmarks/umi/unassigned_experiment.{replicate_label}.copy_with_umi.txt"
    shell:
        "zcat {input.fq_1} | awk 'NR % 4 != 1 {{print}} NR % 4 == 1 {{split($1,header,\":\"); print $1 \":\" substr(header[1],2,length(header[1]) - 1) }}' | gzip > {output.fq_1} &"
        "zcat {input.fq_2} | awk 'NR % 4 != 1 {{print}} NR % 4 == 1 {{split($1,header,\":\"); print $1 \":\" substr(header[1],2,length(header[1]) - 1) }}' | gzip > {output.fq_2};"
augustboyle commented 1 year ago

That appears to be code that I wrote specifically for processing ENCODE 3 data downloaded from the ENCODE portal. It's not part of Skipper or meant to be run for anything else but you can certainly adapt it if it's useful.

That step is reformatting the UMI encoded in the fastqs so whether that line will run (e.g., successfully) depends on whether the header has the right delimiter.