Closed tomaszjacek closed 3 years ago
Hi @tomaszjacek,
can you post the contents of the TEFLoN specific log? That should make it easier for me to determine what is going wrong. Based on the paths in the error you posted, the TEFLoN log should be at: /data/mcclintock/test/output/log/*/teflon.log
Thanks, Preston
Im sorry i dont know how to attach the file. is it possible here? So, I have to pste it. teflon.log file is 1135 lines long with many times "Processed 990100 reads..." but ends with error
Thank you, tj
[M::mem_process_seqs] Processed 990100 reads in 83.325 CPU sec, 8.546 real sec
[M::process] read 990100 sequences (100000100 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (217, 401435, 74, 95)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (61, 132, 672)
[M::mem_pestat] low and high boundaries for computing mean and (1, 1894)
[M::mem_pestat] mean and (313.10, 375.83)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 2505)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (276, 301, 320)
[M::mem_pestat] low and high boundaries for computing mean and (188, 408)
[M::mem_pestat] mean and (298.18, 33.74)
[M::mem_pestat] low and high boundaries for proper pairs: (144, 452)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (257, 3703, 9499)
[M::mem_pestat] low and high boundaries for computing mean and (1, 27983)
[M::mem_pestat] mean and (4134.85, 3903.56)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 37225)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (495, 753, 1247)
[M::mem_pestat] low and high boundaries for computing mean and (1, 2751)
[M::mem_pestat] mean and (747.34, 386.19)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 3503)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 990100 reads in 86.595 CPU sec, 8.862 real sec
[M::process] read 990100 sequences (100000100 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (205, 396794, 77, 95)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (62, 140, 510)
[M::mem_pestat] low and high boundaries for computing mean and (1, 1406)
[M::mem_pestat] mean and (314.52, 367.90)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1854)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (275, 301, 319)
[M::mem_pestat] low and high boundaries for computing mean and (187, 407)
[M::mem_pestat] mean and (297.60, 34.11)
[M::mem_pestat] low and high boundaries for proper pairs: (143, 451)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (271, 4322, 8277)
[M::mem_pestat] low and high boundaries for computing mean and (1, 24289)
[M::mem_pestat] mean and (3993.38, 3576.53)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 32295)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (449, 703, 1217)
[M::mem_pestat] low and high boundaries for computing mean and (1, 2753)
[M::mem_pestat] mean and (687.53, 371.81)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 3521)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 990100 reads in 92.206 CPU sec, 9.404 real sec
[M::process] read 918116 sequences (92729716 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (211, 394908, 65, 89)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (71, 135, 446)
[M::mem_pestat] low and high boundaries for computing mean and (1, 1196)
[M::mem_pestat] mean and (211.70, 216.78)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1571)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (274, 300, 319)
[M::mem_pestat] low and high boundaries for computing mean and (184, 409)
[M::mem_pestat] mean and (296.91, 34.69)
[M::mem_pestat] low and high boundaries for proper pairs: (139, 454)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (285, 2584, 9521)
[M::mem_pestat] low and high boundaries for computing mean and (1, 27993)
[M::mem_pestat] mean and (3933.18, 3790.37)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 37229)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (404, 643, 1227)
[M::mem_pestat] low and high boundaries for computing mean and (1, 2873)
[M::mem_pestat] mean and (683.83, 464.21)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 3696)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 990100 reads in 92.694 CPU sec, 9.479 real sec
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (174, 337492, 61, 93)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (69, 131, 548)
[M::mem_pestat] low and high boundaries for computing mean and (1, 1506)
[M::mem_pestat] mean and (310.03, 353.80)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1985)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (271, 298, 317)
[M::mem_pestat] low and high boundaries for computing mean and (179, 409)
[M::mem_pestat] mean and (294.40, 35.79)
[M::mem_pestat] low and high boundaries for proper pairs: (133, 455)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (308, 2984, 9472)
[M::mem_pestat] low and high boundaries for computing mean and (1, 27800)
[M::mem_pestat] mean and (4027.59, 3658.31)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 36964)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (513, 721, 809)
[M::mem_pestat] low and high boundaries for computing mean and (1, 1401)
[M::mem_pestat] mean and (719.77, 315.99)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1984)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 918116 reads in 97.453 CPU sec, 9.857 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 10 -Y /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered//teflon.prep_MP/teflon.mappingRef.fa /data/mcclintock/test/output/SRR800842_1/intermediate/fastq/SRR800842_1_1.fq /data/mcclintock/test/output/SRR800842_1/intermediate/fastq/SRR800842_1_2.fq
[main] Real time: 389.589 sec; CPU: 3788.028 sec
bwa mem -t 10 -Y /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered//teflon.prep_MP/teflon.mappingRef.fa /data/mcclintock/test/output/SRR800842_1/intermediate/fastq/SRR800842_1_1.fq /data/mcclintock/test/output/SRR800842_1/intermediate/fastq/SRR800842_1_2.fq > /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/teflon.sam
samtools view -Sb /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/teflon.sam > /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/teflon.bam
[bam_sort_core] merging from 20 files...
samtools sort -@ 10 -o /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/teflon.sorted.bam /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/teflon.bam
samtools index /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/teflon.sorted.bam
awk: line 1: syntax error at or near *
Calculating alignment statistics
cmd: samtools stats -t /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/teflon.prep_TF/teflon.genomeSize.txt /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/teflon.sorted.bam
cmd: samtools depth -Q 20 /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/teflon.sorted.bam | awk '{sum+=$3; sumsq+=$3*$3} END {print "Average = ",sum/NR; print "Stdev = ",sqrt(sumsq/NR - (sum/NR)**2)}' > /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/teflon.sorted.cov.txt
Insert size standard deviation estimated as 45. Use the override option if you suspect this is incorrect!
Warning: coverage could not be estimated, enter coverage manually
python /work/mcclintock/install/tools/teflon/ -wd /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/ -d /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/teflon.prep_TF/ -s /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/samples.tsv -i sample -l1 family -l2 family -t 10 -q 20
Traceback (most recent call last):
File "/work/mcclintock/install/tools/teflon/", line 165, in <module>
File "/work/mcclintock/install/tools/teflon/", line 103, in main
samples.append([line.split()[0], line.split()[1], [readLen, insz, sd, total_n,cov,cov_sd]])
UnboundLocalError: local variable 'cov' referenced before assignment
python /work/mcclintock/install/tools/teflon/ -wd /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/ -d /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/teflon.prep_TF/ -s /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/samples.tsv -t 10 -n1 1 -n2 1 -q 20
python /work/mcclintock/install/tools/teflon/ -wd /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/ -d /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/teflon.prep_TF/ -s /data/mcclintock/test/output/SRR800842_1/results/teflon/unfiltered/samples.tsv -t 10 -n1 1 -n2 1 -q 20
-bash-4.2$ wc -l teflon.log
@tomaszjacek: thanks for your feedback on running McClintock. You can attach files by clicking on the bottom bar of the comment box and navigating in your finder/explorer and uploading. Alternatively, you can drag and drop files of select types into the comment box and it will upload automatically. See more here:
Hi, when I run McClintock as following:
python3 ${MCK}/ --reference ../10-reference/HaSCD2.fa \
--consensus ../10-reference/Hadb-families_rename.fa \
--first ../20-NGS/${K}/${K}_1.fastq \
--second ../20-NGS/${K}/${K}_2.fastq \
--proc 48 \
--out ${K} \
--locations ./TE_annotations/HaSCD2/reference_te_locations/unaugmented_inrefTEs.gff \
--taxonomy ./TE_annotations/HaSCD2/te_taxonomy/unaugmented_taxonomy.tsv
I got some errors related to teflon as following:
Error in rule teflon_run:
jobid: 20
output: /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/genotypes/sample.genotypes.txt
conda-env: /home/dell/biosoft/mcclintock/install/envs/conda/54b8d4d7
CalledProcessError in line 49 of /home/dell/biosoft/mcclintock/snakefiles/teflon.snakefile:
Command 'source /home/dell/miniconda3/envs/mcclintock/bin/activate '/home/dell/biosoft/mcclintock/install/envs/conda/54b8d4d7'; set -euo pipefail; /home/dell/miniconda3/envs/mcclintock/bin/python3.7 /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/snakemake/1571076/.snakemake/scripts/' returned non-zero exit status 1.
File "/home/dell/miniconda3/envs/mcclintock/lib/python3.7/site-packages/snakemake/executors/", line 2189, in run_wrapper
File "/home/dell/biosoft/mcclintock/snakefiles/teflon.snakefile", line 49, in __rule_teflon_run
File "/home/dell/miniconda3/envs/mcclintock/lib/python3.7/site-packages/snakemake/executors/", line 529, in _callback
File "/home/dell/miniconda3/envs/mcclintock/lib/python3.7/concurrent/futures/", line 57, in run
File "/home/dell/miniconda3/envs/mcclintock/lib/python3.7/site-packages/snakemake/executors/", line 515, in cached_or_run
File "/home/dell/miniconda3/envs/mcclintock/lib/python3.7/site-packages/snakemake/executors/", line 2201, in run_wrapper
teflon.log as following
writing TE bed files...
writing TE bed files completed!
reducing search space...
cmd: samtools view -@ 4 -L /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/sample.bed_files/mega_complete.bed /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/teflon.sorted.bam -b
search space succesfully reduced...
new reduced bam file: /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/sample.sam_files/mega_complete.bam
clustering TE positions...
[ ================================================== ] 100.00%
clustering TE positions completed!
final reduction of search space...
cmd: samtools view -@ 4 -q 20 -L /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/sample.bed_files/mega_clustered.bed /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/teflon.sorted.bam -b
Error running samtools: p.returncode = 1
python /home/dell/biosoft/mcclintock/install/tools/teflon/ -wd /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/ -d /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/teflon.prep_TF/ -s /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/samples.tsv -i sample -l1 family -l2 family -t 4 -q 20
python /home/dell/biosoft/mcclintock/install/tools/teflon/ -wd /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/ -d /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/teflon.prep_TF/ -s /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/samples.tsv -i sample -l1 family -l2 family -t 4 -q 20
when I run the samtools view manually as
samtools view -@ 4 -q 20 -L /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/sample.bed_files/mega_clustered.bed /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/teflon.sorted.bam -b
I got error as following:
[bed_read] Parse error reading "/home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/sample.bed_files/mega_clustered.bed" at line 63797
samtools view: Could not read file "/home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/sample.bed_files/mega_clustered.bed"
therefore, I get the line 63797 of /home/newsdc/zhang_20201215/insertTE/30-mcclintock/Ac12/Ac12_1/results/teflon/unfiltered/sample.bed_files/mega_clustered.bed as following
it just included one site, may be start or end?
Meanwhile, I found another potential error in as following
chr19 4007485 Unchr32 651720 651859
it seems to be chimeric records.
So, the error above may occur during clustering TE positions?
awk: line 1: syntax error at or near *
GNU Awk 4.2.1
and haven't had issues
$ awk --version
GNU Awk 4.2.1, API: 2.0 (GNU MPFR 3.1.6-p2, GNU MP 6.1.2)
Copyright (C) 1989, 1991-2018 Free Software Foundation.
is used.
$ grep "awk" *.py cmd="""%s depth -Q %s %s | awk '{sum+=$3; sumsq+=$3*$3} END {print "Average = ",sum/NR; print "Stdev = ",sqrt(sumsq/NR - (sum/NR)**2)}' > %s""" %(exeSAM, str(qual), bam, covFILE)
$ grep "awk" teflon_scripts/*.py
teflon_scripts/ cmd="""%s depth -Q %s %s | awk '{sum+=$3; sumsq+=$3*$3} END {print "Average = ",sum/NR; print "Stdev = ",sqrt(sumsq/NR - (sum/NR)**2)}' > %s""" %(exePATH, str(qual), bamFILE, covFILE)
to denote an exponent instead of ^
. After some googling, I found that this is apparently not compatible with all awk interpreters and may cause issues with mawk
which is used by some linux OS.
interpreter to mawk
in the TEFLoN conda environment to ensure that users are using the same awk
is malformed.
- @zhjpeng (#76 (comment)) I have seen this issue before as well. It seems to be sample dependent. Most of my McClintock runs with TEFLoN do not have this issue but some specific samples will have this occur where the
is malformed.- I am fairly certain this is a bug in TEFLoN and not related to mcclintock, so I am going to work on replicating this bug outside of McClintock with just TEFLoN. Then I'll open an issue on the actual TEFLoN repository ( to see if their developers know what is going on.
- I'll let you know when I've posted the issue
Thanks for your reply, I am running mcclintock in more samples and check whether other samples have similar errors.
@tomaszjacek: thanks for your feedback on running McClintock. You can attach files by clicking on the bottom bar of the comment box and navigating in your finder/explorer and uploading. Alternatively, you can drag and drop files of select types into the comment box and it will upload automatically. See more here:
Thank you, tj
git pull
. Then you should do a clean install with --install
which will install TEFLoN with the updated conda environment.
- @tomaszjacek I've updated the mcclintock master branch b61563e with the change to the TEFLoN environment that now includes gawk. You should be able to update your mcclintock repository with a
git pull
. Then you should do a clean install --install
which will install TEFLoN with the updated conda environment.- Let me know if this resolves the bug you were experiencing earlier.
It works, Thank you, tj
unfortunately git pull && --install
didn't help me
is there any way to verify teflon was updated and/or a way to get a component version being used?
Hi @yuryfunikov ,
cd /path/to/mcclintock
git rev-parse HEAD
Hi and thanks for the answer,
this is what i got:
mcclintock$ git rev-parse HEAD
python3 ./../mcclintock/ -r dvir-all-chromosome-r.1.06.fasta -c asymmetric_TEs_v1.fasta -1 160JB_dna_seq_1_trimmed.fastq.gz -2 160JB_dna_seq_2_trimmed.fastq.gz -p 1 -m teflon -o mcclintock_out_assTEv1_160_refgen/ --resume --debug
that resulted in following error:
CalledProcessError in line 49 of /path/to/file/mcclintock/snakefiles/teflon.snakefile:
Command 'source /opt/miniconda/envs/mcclintock/bin/activate '/path/to/file/mcclintock/install/envs/conda/cc1216b5'; set -euo pipefail; /opt/miniconda/envs/mcclintock/bin/python3.7 /path/to/file/mcclintock_out_assTEv1_160_refgen/snakemake/3370691/.snakemake/scripts/' returned non-zero exit status 1.
File "/opt/miniconda/envs/mcclintock/lib/python3.7/site-packages/snakemake/executors/", line 2189, in run_wrapper
File "/path/to/filemcclintock/snakefiles/teflon.snakefile", line 49, in __rule_teflon_run
File "/opt/miniconda/envs/mcclintock/lib/python3.7/site-packages/snakemake/executors/", line 529, in _callback
File "/opt/miniconda/envs/mcclintock/lib/python3.7/concurrent/futures/", line 57, in run
File "/opt/miniconda/envs/mcclintock/lib/python3.7/site-packages/snakemake/executors/", line 515, in cached_or_run
File "/opt/miniconda/envs/mcclintock/lib/python3.7/site-packages/snakemake/executors/", line 2201, in run_wrapper
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: //path/to/file/mcclintock_out_assTEv1_160_refgen/snakemake/3370691/.snakemake/log/2021-03-15T001010.010823.snakemake.log
-rw-rw-r-- 1 sergey sergey 2425 Mar 15 00:16 ./mcclintock_out_assTEv1_160_refgen/logs/20210315.001008.3370691/teflon.log
writing TE bed files...
writing TE bed files completed!
reducing search space...
cmd: samtools view -@ 1 -L /path/to/file/mcclintock_out_assTEv1_160_refgen/160JB_dna_seq_1_trimmed/results/teflon/unfiltered/sample.bed_files/mega_complete.bed /path/to/file/mcclintock_out_assTEv1_160_refgen/160JB_dna_seq_1_trimmed/results/teflon/unfiltered/teflon.sorted.bam -b
Error running samtools: p.returncode = 1
and i must say that it looks like mega_complete.bed wasn't created at all:
/path/to/file/mcclintock_out_assTEv1_160_refgen/160JB_dna_seq_1_trimmed/results/teflon/unfiltered/sample.bed_files/mega_complete.bed: No such file or directory
also i should say that the pipeline used to be working without problems but then it stated failing with this error from time to time and now it fails every time we run the script
pls let me know if you think i should file a new ticket regarding this
Thanks @yuryfunikov this looks like a similar problem as described in: We have contacted the TEFLoN developer and I think that the bug has been fixed (see: but I am currently testing it and integrating the changes in mcclintock. I'll let you know when these changes have been integrated.
sorry for bothering but have you had a chance to look into this?
@yuryfunikov Sorry for not replying earlier, but I have integrated the most recent update to TEFLoN into mcclintock. So I'd suggest re-installing the newest version of mcclintock: and trying TEFLoN again on your sample to see if the issue is resolved
When I run the teflon analysis with command
I got the error
Is it bug of teflon software? or I should use some extraa option in command?
Thank you, tj