CRG-CNAG / CalliNGS-NF

GATK RNA-Seq Variant Calling in Nextflow
Mozilla Public License 2.0
130 stars 53 forks source link

Using Single end data #11

Closed benkraj closed 4 years ago

benkraj commented 4 years ago

Hi-

I have a single end RNA-seq data set that I would like to use the pipeline on. I've tried, but it seems to only complete processes 1A-1D and doesn't begin any of the others. I'm guessing this is due to only having one fastq, but I'm not 100% that's the issue.

Is there a way to specific to use single end data --- or could you point me in the right direction to update the pipeline for this purpose?

Any help would be appreciated.

Thanks, Ben

benkraj commented 4 years ago

Sorry I'm wrong, it isn't about single end (though maybe that's also another problem).

I attempted it again with paired end reads I downloaded online and it still stops after the first 4 steps.

krajacichbj@cn0983 pipeline$ bash rna.variant.test.mosq.sh Loading singularity 3.5.2 on cn83
N E X T F L O W ~ version 19.10.0 Pulling CRG-CNAG/CalliNGS-NF ... downloaded from https://github.com/CRG-CNAG/CalliNGS-NF.git Launching CRG-CNAG/CalliNGS-NF astonishing_fermat - revision: 841638691a [master] C A L L I N G S - N F v 1.0 genome : rna.variant/Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4.fa reads : rna.variants/test.mosq/{1,2}.fastq.gz variants : rna.variant/ag1000g.phase2.ar1.variants.pass.X.vcf.gz blacklist: rna.variant/Ag.sorted.bed results : rna.variant/results/ gatk : rna.variant/gatk-3.7.0/GenomeAnalysisTK.jar executor > slurm (4) [ac/60f52d] process > 1A_prepare_genome_samtools (Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4) [100%] 1 of 1 ✔ [30/b914a0] process > 1B_prepare_genome_picard (Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4) [100%] 1 of 1 ✔ [eb/ef49b2] process > 1C_prepare_star_genome_index (Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4) [100%] 1 of 1 ✔ [91/7d38e3] process > 1D_prepare_vcf_file (ag1000g.phase2.ar1.variants.pass.X.vcf) [100%] 1 of 1 ✔ [- ] process > 2_rnaseq_mapping_star - [- ] process > 3_rnaseq_gatk_splitNcigar - [- ] process > 4_rnaseq_gatk_recalibrate - [- ] process > 5_rnaseq_call_variants - [- ] process > 6A_post_process_vcf - [- ] process > 6B_prepare_vcf_for_ase - [- ] process > 6C_ASE_knownSNPs - Completed at: 15-Jan-2020 16:04:15 Duration : 10m 1s CPU hours : 0.1 Succeeded : 4

I am initializing the pipeline with: rna.variant/nextflow run CRG-CNAG/CalliNGS-NF \ -c rna.variant/pipeline/CalliNGS-NF/biowulf.config2 \ --reads 'rna.variants/test.mosq/{1,2}.fastq.gz' \ --genome rna.variant/Anopheles-gambiae-PEST_CHROMOSOMES_AgamP4.fa \ --variants rna.variant/ag1000g.phase2.ar1.variants.pass.X.vcf.gz \ --blacklist rna.variant/Ag.sorted.bed \ --results rna.variant/results/ \ --gatk rna.variant/gatk-3.7.0/GenomeAnalysisTK.jar \ --max_memory '128.GB'

Any thoughts as to why it stops after the first 4 processes would be helpful. I don't get any errors that I see.

Thanks, Ben

maltesemike commented 4 years ago

I am having exactly the same problem here... the first 4 processes run but the remainder of the pipeline 'fails' ...

maltesemike commented 4 years ago

Any thoughts as to why it stops after the first 4 processes would be helpful. I don't get any errors that I see.

OK I've solved it my end.. I had forgotten to add an asterisk (*) in the designated reads variable.. so nextflow is looking in that directory and not finding the files to add to the channel ...

So in your case you have reads : rna.variants/test.mosq/{1,2}.fastq.gz

Try this instead reads : rna.variants/test.mosq/*{1,2}.fastq.gz

benkraj commented 4 years ago

Ah yes now it does continue for my test paired end reads. Thanks a lot for catching that @maltesemike !!

So then back to the original problem, is there a way to adapt the pipeline for single-end RNA-seq? @pditommaso , do you have any suggestions?

benkraj commented 4 years ago

I'm struggling with GATK issues, but correcting my filenames (even with single-end) allowed the run to proceed. So I will close this issue.