Closed jwang147 closed 1 year ago
$ ksrates paralogs-ks config_Dini.txt --n-threads=20
INFO Paralog wgd analysis for species [Dini]
INFO Tue Apr 11 16:00:41 2023
INFO - - - - - - - - - - - - - - - - - - - -
INFO Checking if sequence data files exist and if sequence IDs are compatible with wgd pipeline...
INFO Completed
INFO Creating directory [paralog_distributions/]
INFO Running wgd paralog Ks pipeline...
INFO ---
INFO Checking external software...
INFO makeblastdb: 2.12.0+
INFO blastp: 2.12.0+
INFO mcl 14-137
INFO muscle 5.1.linux64 []
INFO AAML in paml version 4.9, March 2015
INFO FastTree Version 2.1.11 Double precision (No SSE3)
INFO Creating output directory /data/jwang/1.2.ksrate/paralog_distributions/wgd_Dini
INFO Translating CDS file Dini.cds...
WARNING Sequence length != multiple of 3 for Chr01G001412.mRNA1!
WARNING Invalid codon G in Chr01G001412.mRNA1
WARNING Sequence length != multiple of 3 for Chr02G004296.mRNA1!
WARNING Invalid codon G in Chr02G004296.mRNA1
WARNING Sequence length != multiple of 3 for Chr03G004056.mRNA1!
WARNING Invalid codon G in Chr03G004056.mRNA1
WARNING Sequence length != multiple of 3 for Chr05G004423.mRNA1!
WARNING Invalid codon TT in Chr05G004423.mRNA1
WARNING Sequence length != multiple of 3 for Chr05G004423.mRNA2!
WARNING Invalid codon TT in Chr05G004423.mRNA2
WARNING Sequence length != multiple of 3 for Chr07G000260.mRNA1!
WARNING Invalid codon C in Chr07G000260.mRNA1
WARNING There were 12 warnings during translation
INFO ---
INFO Running all versus all Blastp
INFO Writing protein Blastdb sequences to /data/jwang/1.2.ksrate/paralog_distributions/wgd_Dini/Dini.blast_tmp/...
INFO Writing protein query sequences to /data/jwang/1.2.ksrate/paralog_distributions/wgd_Dini/Dini.blast_tmp/...
INFO Performing all versus all Blastp (this might take a while)...
INFO Making Blastdb
INFO makeblastdb -in /data/jwang/1.2.ksrate/paralog_distributions/wgd_Dini/Dini.blast_tmp/Dini.db.fasta -dbtype prot
INFO makeblastdb output:
Building a new DB, current time: 04/11/2023 16:00:53
New DB name: /data/jwang/1.2.ksrate/paralog_distributions/wgd_Dini/Dini.blast_tmp/Dini.db.fasta
New DB title: /data/jwang/6-dini_genome/1.2.ksrate/paralog_distributions/wgd_Dini/Dini.blast_tmp/Dini.db.fasta
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 46825 sequences in 0.830938 seconds.
INFO Running Blastp
INFO blastp -db /data/jwang/1.2.ksrate/paralog_distributions/wgd_Dini/Dini.blast_tmp/Dini.db.fasta -query /data/jwang/6-dini_genome/1.2.ksrate/paralog_distributions/wgd_Din
i/Dini.blast_tmp/Dini.query.fasta -evalue 1e-10 -outfmt 6 -num_threads 20 -out /data/jwang/1.2.ksrate/paralog_distributions/wgd_Dini/Dini.blast.tsv
INFO All versus all Blastp done
INFO Removing tmp directory
INFO ---
INFO Running gene family construction (MCL clustering with inflation factor = 2.0)
INFO Started MCL clustering (mcl)
INFO ---
INFO Running whole paranome Ks analysis...
WARNING Filtered out the largest gene family because its size is > 200
WARNING If you want to analyse this large family anyhow, please raise the `max_gene_family_size` parameter
INFO Started analysis of 7449 gene families in parallel using 20 threads
INFO Performing analysis on gene family GF_000002 (size 194)
ERROR Unexpected internal error during analysis of gene family GF_000002:
Traceback (most recent call last):
File "/home/jwang/miniconda3/envs/wgd/lib/python3.7/site-packages/wgd_ksrates/ks_distribution.py", line 280, in analyse_family_try_except
n_families, is_last_family)
File "/home/jwang/miniconda3/envs/wgd/lib/python3.7/site-packages/wgd_ksrates/ks_distribution.py", line 371, in analyse_family
msa_path, stats, successful = prepare_aln(msa_path_protein, nucleotide)
File "/home/jwang/miniconda3/envs/wgd/lib/python3.7/site-packages/wgd_ksrates/alignment.py", line 43, in prepare_aln
with open(msa_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/jwang/1.2.ksrate/paralog_distributions/wgd_Dini/Dini.ks_tmp/GF_000002.fasta.msa'
ERROR Skipping gene family
INFO Performing analysis on gene family GF_000003 (size 120)
ERROR Unexpected internal error during analysis of gene family GF_000003:
Hi! Thanks for reaching out. It seems that there is a problem with running the multiple sequence alignment step, as the output files (e.g. "GF_000001.fasta.msa") don't exist:
File "/home/jwang/miniconda3/envs/wgd/lib/python3.7/site-packages/wgd_ksrates/alignment.py", line 43, in prepare_aln
with open(msa_file, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory:
'/home/jwang/my_apps/ksrates/test/paralog_distributions/wgd_elaeis/elaeis.ks_tmp/GF_000001.fasta.msa'
ERROR Skipping gene family
I don't know what causes it exactly, but it might be a version issue: you're using MUSCLE version 5.1.linux64, while I tested with MUSCLE v3.8.31. Would it be possible to try using this other version? Note that in case installing the required software is a problem, you can make use of a Docker or Singularity container.
To be ahead of other versioning issues, here are other version differences that might cause troubles:
@Cecilia-Sensalari Thank you both for your reply! I reinstalled the ksrates software and replaced the blast with version 2.5.0. Now it's working.
No problem! Good to know :)
Hello, when I run in test data and my data, I get an error at the paralogs-ks step like that, could you give some advice. Thank you very much.
ksrates paralogs-ks config_elaeis.txt --n-threads=20