Open craftor18 opened 7 months ago
Below that, I find if I directly use the tsv and gff file for input and create a new fold for sample TE detect,It wiil be always ran the mcclintock.py and will not go next
-rw-rw-r-- 1 zengy zengy 1.2K May 7 00:06 bc17.log
(mcclintock) zengy@LiusServer:/data/zengy/reseq_C.auratus/work/mcclintock/logs$ cat bc17.log
SETUP checking fasta: /data/zengy/reseq_C.auratus/work/mcclintock/ref/C.auratus.chromosome_20210819.fasta
SETUP checking fastq: /data/zengy/reseq_C.auratus/work/mcclintock/input_fastq/bc17_1.fastq.gz
SETUP checking fastq: /data/zengy/reseq_C.auratus/work/mcclintock/input_fastq/bc17_2.fastq.gz
SETUP checking fasta: /data/zengy/reseq_C.auratus/work/mcclintock/ref/TElib.fa
SETUP McClintock Version: 702acb4baacf53c732df84b9678490b8ea199495
SETUP Checking config files to ensure previous intermediate files are compatible with this run
Job counts:
count jobs
1 index_reference_genome
1 make_ref_te_bed
1 make_reference_fasta
1 make_te_annotations
1 map_reads
1 median_insert_size
1 ngs_te_mapper_post
1 ngs_te_mapper_run
1 process_temp
1 reference_2bit
1 relocaTE_consensus
1 relocaTE_post
1 relocaTE_ref_gff
1 relocaTE_run
1 run_temp
1 sam_to_bam
1 setup_reads
1 summary_report
1 telocate_taxonomy
19
PROCESSING formatting the name of consensus TE fasta headers for compatibility with relocaTE
PROCESSING relocaTE consensus fasta created
(mcclintock) zengy@LiusServer:/data/zengy/reseq_C.auratus/work/mcclintock/logs$ cat make_annotation.log
SETUP checking fasta: /data/zengy/reseq_C.auratus/work/mcclintock/ref/C.auratus.chromosome_20210819.fasta
SETUP checking fasta: /data/zengy/reseq_C.auratus/work/mcclintock/ref/TElib.fa
SETUP McClintock Version: 702acb4baacf53c732df84b9678490b8ea199495
Job counts:
count jobs
1 make_consensus_fasta
1 make_reference_fasta
1 make_te_annotations
3
PROCESSING making consensus fasta
PROCESSING consensus fasta created
PROCESSING making reference fasta
PROCESSING reference fasta created
PROCESSING making reference TE annotations
PROCESSING no reference TEs provided... finding reference TEs with RepeatMasker &> /data/zengy/reseq_C.auratus/work/mcclintock/output_template_all/logs/20240506.163954.7526138/processing.log
PROCESSING reference TE annotations created
like above, make annotation step and resume sample step has the same processing and make annotation truely has been created all files that I need
Hi @craftor18
Could you try simplifying your initial --make_annotations
execution and using full paths to your directories, e.g.
nohup python3 ~/software/mcclintock/mcclintock.py -r /full/path/to/ref/C.auratus.chromosome_20210819.fasta -c /full/path/to/ref/TElib.fa -p 80 -o /full/path/to/output_template_all/ --make_annotations > /full/path/to/logs/make_annotation.log &
nohup python3 ~/software/mcclintock/mcclintock.py -r /full/path/to/ref/C.auratus.chromosome_20210819.fasta -c /full/path/to/ref/TElib.fa -1 /full/path/to/input_fastq/bc17_1.fastq.gz -2 /full/path/to/input_fastq/bc17_2.fastq.gz -p 40 -m relocate,TEMP,ngs_te_mapper -o /full/path/to/output_template_all/ --resume > /full/path/to/logs/bc17_template.log &
If this doesn't work, can you upload the complete make_annotation.log
and bc17_template.log
files?
Thanks, Casey
Thanks for answering,I'll try a full path.But I do not think its a path problem.Because I have use --make_annotation to generate a output dir and use this dir to resume run for a sample ,But it rerun the RepeatMasker step for generating annotation and I delete it.Maybe I should try another version of mcclintock. Could you please tell me which version should I use? Release version or master version or latency fix version? Now my version is master version but I use the mcclintock.py in latency fix version. Best wishes
Hello,I''ve tried another way to prepare my gff file and tsv file .I use EDTA to make gff file and by some command to make my input gff and tsv file like,gff is:
(mcclintock) zengy@LiusServer:/data/zengy/reseq_C.auratus/work/non_ref_TE/ref_genome$ head test.gff
LG01 EDTA Mutator_TIR_transposon 10618026 10621335 . . . ID=TE_struc_145;Name=TE_00000001;Classification=DNA/DTM;Sequence_ontology=SO:0002280;Identity=1;Method=structural;TSD=TGCAGCTGCA_TGCAGCTGCA_100.0;TIR=GCAACTTGCG_CGCAAGTTGC
LG01 EDTA Mutator_TIR_transposon 15685590 15686152 4775 - . ID=TE_homo_272562;Name=TE_00000001;Classification=DNA/DTM;Sequence_ontology=SO:0002280;Identity=0.989;Method=homology
LG01 EDTA Mutator_TIR_transposon 15688968 15689443 3968 + . ID=TE_homo_272566;Name=TE_00000001;Classification=DNA/DTM;Sequence_ontology=SO:0002280;Identity=0.957;Method=homology
LG01 EDTA Mutator_TIR_transposon 15831079 15831563 3946 - . ID=TE_homo_272818;Name=TE_00000001;Classification=DNA/DTM;Sequence_ontology=SO:0002280;Identity=0.953;Method=homology
LG01 EDTA Mutator_TIR_transposon 15839967 15840529 4754 + . ID=TE_homo_272827;Name=TE_00000001;Classification=DNA/DTM;Sequence_ontology=SO:0002280;Identity=0.98;Method=homology
LG01 EDTA Mutator_TIR_transposon 20330106 20330666 4640 + . ID=TE_homo_279690;Name=TE_00000001;Classification=DNA/DTM;Sequence_ontology=SO:0002280;Identity=0.982;Method=homology
LG01 EDTA Mutator_TIR_transposon 20334906 20335461 4277 + . ID=TE_homo_279697;Name=TE_00000001;Classification=DNA/DTM;Sequence_ontology=SO:0002280;Identity=0.949;Method=homology
LG02 EDTA Mutator_TIR_transposon 11438649 11439221 4686 - . ID=TE_homo_390467;Name=TE_00000001;Classification=DNA/DTM;Sequence_ontology=SO:0002280;Identity=0.97;Method=homology
LG02 EDTA Mutator_TIR_transposon 11439222 11439368 1167 + . ID=TE_homo_390468;Name=TE_00000001;Classification=DNA/DTM;Sequence_ontology=SO:0002280;Identity=0.952;Method=homology
LG02 EDTA Mutator_TIR_transposon 23939864 23940375 3663 + . ID=TE_homo_408718;Name=TE_00000001;Classification=DNA/DTM;Sequence_ontology=SO:0002280;Identity=0.967;Method=homology
(mcclintock) zengy@LiusServer:/data/zengy/reseq_C.auratus/work/non_ref_TE/ref_genome$
and tsv is:
(mcclintock) zengy@LiusServer:/data/zengy/reseq_C.auratus/work/non_ref_TE/ref_genome$ head test.tsv
TE_struc_145 TE_00000001
TE_homo_272562 TE_00000001
TE_homo_272566 TE_00000001
TE_homo_272818 TE_00000001
TE_homo_272827 TE_00000001
TE_homo_279690 TE_00000001
TE_homo_279697 TE_00000001
TE_homo_390467 TE_00000001
TE_homo_390468 TE_00000001
TE_homo_408718 TE_00000001
and my run command is :nohup python3 ~/software/mcclintock/mcclintock.py -r ./ref_genome/C.auratus.chromosome_20210819.fasta -c ./ref_genome/test.fa -1 ./00_input_fastq_datas/bc17_1.fastq.gz -2 ./00_input_fastq_datas/bc17_2.fastq.gz -p 8 -m relocate2,temp2,ngs_te_mapper2 -o ./06_mcclintock/ --sample_name bc17 -g ./ref_genome/test.gff -t ./ref_genome/test.tsv > mcclintock_bc17.log & (wd: /data/zengy/reseq_C.auratus/work/non_ref_TE)
Why its log is :
(mcclintock) zengy@LiusServer:/data/zengy/reseq_C.auratus/work/non_ref_TE$ cat mcclintock_bc17.log
SETUP checking fasta: /data/zengy/reseq_C.auratus/work/non_ref_TE/ref_genome/C.auratus.chromosome_20210819.fasta
SETUP checking fastq: /data/zengy/reseq_C.auratus/work/non_ref_TE/00_input_fastq_datas/bc17_1.fastq.gz
SETUP checking fastq: /data/zengy/reseq_C.auratus/work/non_ref_TE/00_input_fastq_datas/bc17_2.fastq.gz
SETUP checking fasta: /data/zengy/reseq_C.auratus/work/non_ref_TE/ref_genome/test.fa
SETUP checking locations gff: /data/zengy/reseq_C.auratus/work/non_ref_TE/ref_genome/test.gff
SETUP checking taxonomy TSV: /data/zengy/reseq_C.auratus/work/non_ref_TE/ref_genome/test.tsv
SETUP McClintock Version: 702acb4baacf53c732df84b9678490b8ea199495
Job counts:
count jobs
1 index_reference_genome
1 make_consensus_fasta
1 make_ref_te_bed
1 make_reference_fasta
1 make_te_annotations
1 map_reads
1 median_insert_size
1 ngs_te_mapper2_post
1 ngs_te_mapper2_pre
1 ngs_te_mapper2_run
1 process_temp2
1 reference_2bit
1 relocaTE2_post
1 relocaTE2_run
1 repeatmask
1 run_temp2
1 sam_to_bam
1 setup_reads
1 summary_report
1 telocate_taxonomy
20
PROCESSING making consensus fasta
PROCESSING consensus fasta created
PROCESSING making reference fasta
PROCESSING reference fasta created
PROCESSING creating 2bit file from reference genome fasta &> /data/zengy/reseq_C.auratus/work/non_ref_TE/06_mcclintock/logs/20240511.205850.2061338/processing.log
PROCESSING reference 2bit file created
Failed to solve scheduling problem with ILP solver. Falling back to greedy solver.Run Snakemake with --verbose to see the full solver output for debugging the problem.
Truly its not a mistake,and the program is also running.But I 've supply a gff and a tsv file ,why program still run a repeatmasker progress for re-generating annotation file ? Below is what is running program:
top - 21:07:45 up 2 days, 10:09, 2 users, load average: 5.35, 4.50, 4.60
Tasks: 817 total, 6 running, 811 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.0 us, 1.3 sy, 0.0 ni, 93.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 128393.1 total, 3840.5 free, 10551.6 used, 115048.4 buff/cache
MiB Swap: 8192.0 total, 7804.0 free, 388.0 used. 117841.5 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4037324 zengy 20 0 992472 581176 2308 R 100.0 0.4 8:03.51 bwa index /data/zengy/reseq_C.auratus/work/non_ref_TE/06_mcclintock/C.auratus.chromosome_20210819/genome_fasta/C.auratus.chromosome_20210819.fasta
4040162 zengy 20 0 3560 1812 1400 R 99.3 0.0 3:30.79 gzip -cd /data/zengy/reseq_C.auratus/work/non_ref_TE/00_input_fastq_datas/bc17_2.fastq.gz
4046688 zengy 20 0 514828 489860 2824 R 30.9 0.4 0:00.94 perl /home/zengy/software/mcclintock/install/envs/conda/46211ea1/share/RepeatMasker/RepeatMasker -pa 3 -lib /data/zengy/reseq_C.auratus/work/non_ref_TE/06_mcclintock/C.auratus.chromosome_2+
4046699 zengy 20 0 514828 490736 3696 R 18.8 0.4 0:00.57 perl /home/zengy/software/mcclintock/install/envs/conda/46211ea1/share/RepeatMasker/RepeatMasker -pa 3 -lib /data/zengy/reseq_C.auratus/work/non_ref_TE/06_mcclintock/C.auratus.chromosome_2+
4037240 zengy 20 0 527052 504164 4904 S 12.5 0.4 3:21.62 perl /home/zengy/software/mcclintock/install/envs/conda/46211ea1/share/RepeatMasker/RepeatMasker -pa 3 -lib /data/zengy/reseq_C.auratus/work/non_ref_TE/06_mcclintock/C.auratus.chromosome_2+
4046716 zengy 20 0 526512 502416 3696 R 4.6 0.4 0:00.14 perl /home/zengy/software/mcclintock/install/envs/conda/46211ea1/share/RepeatMasker/RepeatMasker -pa 3 -lib /data/zengy/reseq_C.auratus/work/non_ref_TE/06_mcclintock/C.auratus.chromosome_2+
Can you explain it?Thank you very much ! Best wishes
It seems my input gff and tsv file only work for the ngs_mapper2 method
And I also find that when the tsv and gff contain too much TE family lines ,Its time to parse paramers will be very long
Hi, I am using this the --make_annotaion preprocessing to run a multiple samples TE detecting,but after I ran out the --make_annotation,then I use -1 -2 to add sample,and find a repeat annotation step for relocate2 and other steps,which means RepeatMasker is progressed multiple times for every sample. Cound you help me? Below is my annotaion step code :
Then I add a sample for resume:
I find RepeatMasker and bwa index steps were re-run ,could you please tell me why? Thanks Best wishes!