JinfengChen / RelocaTE2

RelocaTE2
MIT License
14 stars 7 forks source link

NO FULLREADS FILES PRODUCED #17

Open cecilelorrain opened 5 years ago

cecilelorrain commented 5 years ago

Hi,

I am struggling to run the step4 of the pipeline, I have the flanking.read produced but not the fullreads:

/home/cecile/Desktop/Postdoc_cecile/2_Zymo/4-Mutation_accumulation_analysis_TE_dynamics/1-TESTS/test_RelocalTE2 fastq /home/cecile/Desktop/Postdoc_cecile/2_Zymo/4-Mutation_accumulation_analysis_TE_dynamics/1-TESTS/test_RelocalTE2/2822_AB_test_nosplit/repeat/flanking_seq/2822_AB-R1.te_repeat.flankingReads.fq /home/cecile/Desktop/Postdoc_cecile/2_Zymo/4-Mutation_accumulation_analysis_TE_dynamics/1-TESTS/test_RelocalTE2/2822_AB_test_nosplit/repeat/flanking_seq/2822_AB-R2.te_repeat.flankingReads.fq all unpaired: /home/cecile/Desktop/Postdoc_cecile/2_Zymo/4-Mutation_accumulation_analysis_TE_dynamics/1-TESTS/test_RelocalTE2/2822_AB_test_nosplit/repeat/flanking_seq/2822_AB-R1.te_repeat.flankingReads.fq all unpaired: /home/cecile/Desktop/Postdoc_cecile/2_Zymo/4-Mutation_accumulation_analysis_TE_dynamics/1-TESTS/test_RelocalTE2/2822_AB_test_nosplit/repeat/flanking_seq/2822_AB-R2.te_repeat.flankingReads.fq testing if bam exists: /home/cecile/Desktop/Postdoc_cecile/2_Zymo/4-Mutation_accumulation_analysis_TE_dynamics/1-TESTS/test_RelocalTE2/2822_AB_test_nosplit/repeat/bwa_aln/Zt09_assembly.structural_variation.2822_AB-R1.te_repeat.flankingReads.fq_1.te_repeat.flankingReads.bwa.mates.bam pre: /home/cecile/Desktop/Postdoc_cecile/2_Zymo/4-Mutation_accumulation_analysis_TE_dynamics/1-TESTS/test_RelocalTE2/2822_AB_test_nosplit/repeat/flanking_seq/2822_AB-R1.te_repeat.flankingReads.fq pre: /home/cecile/Desktop/Postdoc_cecile/2_Zymo/4-Mutation_accumulation_analysis_TE_dynamics/1-TESTS/test_RelocalTE2/2822_AB_test_nosplit/repeat/flanking_seq/2822_AB-R2.te_repeat.flankingReads.fq bam not exists, preceed with bwa to map the reads [bwa_aln_core] convert to sequence coordinate... 0.92 sec [bwa_aln_core] refine gapped alignments... 0.78 sec [bwa_aln_core] print alignments... 2.29 sec [bwa_aln_core] 262144 sequences have been processed. [bwa_aln_core] convert to sequence coordinate... 0.94 sec [bwa_aln_core] refine gapped alignments... 0.74 sec [bwa_aln_core] print alignments... 2.31 sec [bwa_aln_core] 524288 sequences have been processed. [bwa_aln_core] convert to sequence coordinate... 0.63 sec [bwa_aln_core] refine gapped alignments... 0.44 sec [bwa_aln_core] print alignments... [bwa_aln_core] convert to sequence coordinate... 0.80 sec [bwa_aln_core] refine gapped alignments... 0.77 sec [bwa_aln_core] print alignments... 1.47 sec [bwa_aln_core] 691602 sequences have been processed. [main] Version: 0.6.2-r126 [main] CMD: /home/cecile/RelocaTE2/bin/bwa samse /home/cecile/Desktop/Postdoc_cecile/2_Zymo/4-Mutation_accumulation_analysis_TE_dynamics/1-TESTS/test_RelocalTE2/Zt09_assembly.structural_variation.fasta /home/cecile/Desktop/Postdoc_cecile/2_Zymo/4-Mutation_accumulation_analysis_TE_dynamics/1-TESTS/test_RelocalTE2/2822_AB_test_nosplit/repeat/bwa_aln/Zt09_assembly.structural_variation.2822_AB-R1.te_repeat.flankingReads.bwa.single.sai /home/cecile/Desktop/Postdoc_cecile/2_Zymo/4-Mutation_accumulation_analysis_TE_dynamics/1-TESTS/test_RelocalTE2/2822_AB_test_nosplit/repeat/flanking_seq/2822_AB-R1.te_repeat.flankingReads.fq [main] Real time: 31.539 sec; CPU: 13.452 sec 2.27 sec [bwa_aln_core] 262144 sequences have been processed. [bwa_aln_core] convert to sequence coordinate... 0.75 sec [bwa_aln_core] refine gapped alignments... 0.70 sec [bwa_aln_core] print alignments... 1.69 sec [bwa_aln_core] 524288 sequences have been processed. [bwa_aln_core] convert to sequence coordinate... 0.39 sec [bwa_aln_core] refine gapped alignments... 0.34 sec [bwa_aln_core] print alignments... 0.56 sec [bwa_aln_core] 658180 sequences have been processed. [main] Version: 0.6.2-r126 [main] CMD: /home/cecile/RelocaTE2/bin/bwa samse /home/cecile/Desktop/Postdoc_cecile/2_Zymo/4-Mutation_accumulation_analysis_TE_dynamics/1-TESTS/test_RelocalTE2/Zt09_assembly.structural_variation.fasta /home/cecile/Desktop/Postdoc_cecile/2_Zymo/4-Mutation_accumulation_analysis_TE_dynamics/1-TESTS/test_RelocalTE2/2822_AB_test_nosplit/repeat/bwa_aln/Zt09_assembly.structural_variation.2822_AB-R2.te_repeat.flankingReads.bwa.single.sai /home/cecile/Desktop/Postdoc_cecile/2_Zymo/4-Mutation_accumulation_analysis_TE_dynamics/1-TESTS/test_RelocalTE2/2822_AB_test_nosplit/repeat/flanking_seq/2822_AB-R2.te_repeat.flankingReads.fq [main] Real time: 30.302 sec; CPU: 10.876 sec mergeing bam file: 2/2 files [W::bam_merge_core2] No @HD tag found. mergeing fullread bam file: 0/0 files job: sh /home/cecile/Desktop/Postdoc_cecile/2_Zymo/4-Mutation_accumulation_analysis_TE_dynamics/1-TESTS/test_RelocalTE2/2822_AB_test_nosplit/shellscripts/step_4/step_4.Zt09_assembly.structural_variation.repeat.align.sh

Here are the files in flanking_seq/ -rw-rw-r-- 1 cecile cecile 362M Apr 12 09:31 2822_AB-R1.te_repeat.flankingReads.fq -rw-rw-r-- 1 cecile cecile 345M Apr 12 09:36 2822_AB-R2.te_repeat.flankingReads.fq

And in bwa_aln/* -rw-rw-r-- 1 cecile cecile 2,6K Apr 12 09:52 bwa.stderr -rw-rw-r-- 1 cecile cecile 106M Apr 12 09:52 Zt09_assembly.structural_variation.2822_AB-R1.te_repeat.flankingReads.bwa.single.bam -rw-rw-r-- 1 cecile cecile 108M Apr 12 09:52 Zt09_assembly.structural_variation.2822_AB-R2.te_repeat.flankingReads.bwa.single.bam -rw-rw-r-- 1 cecile cecile 214M Apr 12 09:53 Zt09_assembly.structural_variation.repeat.bwa.bam -rw-rw-r-- 1 cecile cecile 1,4K Apr 12 09:52 Zt09_assembly.structural_variation.repeat.bwa.bam.sh -rw-rw-r-- 1 cecile cecile 119M Apr 12 09:54 Zt09_assembly.structural_variation.repeat.bwa.sorted.bam -rw-rw-r-- 1 cecile cecile 45K Apr 12 09:54 Zt09_assembly.structural_variation.repeat.bwa.sorted.bam.bai

Thank you in advance for you help, Cécile

jonathan-wells commented 5 years ago

Hi, I am getting the same problems as Cécile with fullread bamfiles not being produced. The relevant error report is:

job: sh /local/workdir/jnw72/Projects/drerio-tes/scripts/danio_test/RelocaTE2_outdir/shellscripts/step_2/1.fq2fa.sh
job: sh /local/workdir/jnw72/Projects/drerio-tes/scripts/danio_test/RelocaTE2_outdir/shellscripts/step_3/0.te_repeat.blat.sh
job: sh /local/workdir/jnw72/Projects/drerio-tes/scripts/danio_test/RelocaTE2_outdir/shellscripts/step_3/1.te_repeat.blat.sh
testing if bam exists: /local/workdir/jnw72/Projects/drerio-tes/scripts/danio_test/RelocaTE2_outdir/repeat/bwa_aln/chr25.SRR7081528.chr25_2.te_repeat.flankingReads.fq_1.te_repeat.flankingReads.bwa.mates.bam
bam not exists, preceed with bwa to map the reads
[main] Version: 0.6.2-r126
[main] CMD: /programs/miniconda2/envs/RelocaTE2/bin/bwa samse /local/workdir/jnw72/Projects/drerio-tes/scripts/danio_test/chr25.fa /local/workdir/jnw72/Projects/drerio-tes/scripts/danio_test/RelocaTE2_outdir/repeat/bwa_aln/chr25.SRR7081528.chr25_1.te_repeat.flankingReads.bwa.single.sai /local/workdir/jnw72/Projects/drerio-tes/scripts/danio_test/RelocaTE2_outdir/repeat/flanking_seq/SRR7081528.chr25_1.te_repeat.flankingReads.fq
[main] Real time: 0.001 sec; CPU: 0.003 sec
[main] Version: 0.6.2-r126
[main] CMD: /programs/miniconda2/envs/RelocaTE2/bin/bwa samse /local/workdir/jnw72/Projects/drerio-tes/scripts/danio_test/chr25.fa /local/workdir/jnw72/Projects/drerio-tes/scripts/danio_test/RelocaTE2_outdir/repeat/bwa_aln/chr25.SRR7081528.chr25_2.te_repeat.flankingReads.bwa.single.sai /local/workdir/jnw72/Projects/drerio-tes/scripts/danio_test/RelocaTE2_outdir/repeat/flanking_seq/SRR7081528.chr25_2.te_repeat.flankingReads.fq
[main] Real time: 0.001 sec; CPU: 0.002 sec
mergeing bam file: 2/2 files
[W::bam_merge_core2] No @HD tag found.
mergeing fullread bam file: 0/0 files
job: sh /local/workdir/jnw72/Projects/drerio-tes/scripts/danio_test/RelocaTE2_outdir/shellscripts/step_4/step_4.chr25.repeat.align.sh
Step5: Find non-reference insertions
find insertions on chr25
fullread bam: /local/workdir/jnw72/Projects/drerio-tes/scripts/danio_test/RelocaTE2_outdir/repeat/bwa_aln/chr25.repeat.fullreads.bwa.sorted.bam
Traceback (most recent call last):
  File "/programs/miniconda2/envs/RelocaTE2/scripts/relocaTE_insertionFinder.py", line 1825, in <module>
    main()
  File "/programs/miniconda2/envs/RelocaTE2/scripts/relocaTE_insertionFinder.py", line 1809, in main
    read_junction_reads_align(align_file_f, read_repeat, teJunctionReads)
  File "/programs/miniconda2/envs/RelocaTE2/scripts/relocaTE_insertionFinder.py", line 1648, in read_junction_reads_align
    fsam = pysam.AlignmentFile(align_file_f, 'rb')
  File "pysam/calignmentfile.pyx", line 333, in pysam.calignmentfile.AlignmentFile.__cinit__ (pysam/calignmentfile.c:4808)
  File "pysam/calignmentfile.pyx", line 533, in pysam.calignmentfile.AlignmentFile._open (pysam/calignmentfile.c:7027)
IOError: file `/local/workdir/jnw72/Projects/drerio-tes/scripts/danio_test/RelocaTE2_outdir/repeat/bwa_aln/chr25.repeat.fullreads.bwa.sorted.bam` not found

I haven't been able to get this up and running yet, and the problem has always been some variant of a python IOError caused by some file it's expecting to find not existing. The error code shown above comes from running on a test dataset which I tried to replicate as closely as possible the format of the test_data/ file provided with the package.

Thanks in advance, Jon

davidecarlson commented 4 years ago

I had the same issue as Jon and Cécile, but I managed to figure out what the problem was. In my case, at least, Relocate2 was unable to figure out that my input fastq files were actually paired, and so it treated all data as unpaired. After some tests I determined that the problem was the structure of input fastq file names. My fastq files were:

sample_name.1.fastq sample_name.2.fastq

After changing the filenames to:

sample_name_1.fastq sample_name_2.fastq

Relocate2 finished successfully.

I just wanted to make note of this in case others continue to experience this problem. Thanks, Dave

JinfengChen commented 4 years ago

Hi David,

Thank you so much for resolving the issue. Just want to remind that there are options you can explore. If this works you can save some time in the future. Thanks.

Jinfeng

-1 MATE_1_ID, --mate_1_id MATE_1_ID string define paired-end read1, default = "_1" -2 MATE_2_ID, --mate_2_id MATE_2_ID string define paired-end read2, default = "_2"