Rinoahu / SwiftOrtho

A high performance tool to identify orthologs and paralogs across genomes.
GNU General Public License v3.0
27 stars 11 forks source link

No core or shared genes produced #8

Closed 000generic closed 3 years ago

000generic commented 4 years ago

I'm using SwiftOrtho to cluster 4 genome gene model data sets (human, octopus, oyster, and anemone). I used the command line below but I find zero core and zero shared genes - so I'm wondering what I might be getting wrong - or if maybe there is a dependency that is failing.

Here is the command line:

python run_all.py -i four-genomes.aa -a 60 -A apc -s 1011111,11111 -u 0.95 -l 0.05 > log-swiftortho-stdout-stderr 2>&1

AND a header

>human-MSI2|70065

AND the files produced are:

-rw-r--r-- 1 eedsinger mnlsc 2634981 Apr 24 19:55 four-genomes.aa.clsr -rw-r--r-- 1 eedsinger mnlsc 0 Apr 24 21:02 four-genomes.aa.nwk -rw-r--r-- 1 eedsinger mnlsc 46725165 Apr 24 19:55 four-genomes.aa.opc -rw-r--r-- 1 eedsinger mnlsc 4090899422 Apr 24 20:41 four-genomes.aa.pan -rw-r--r-- 1 eedsinger mnlsc 4826112637 Apr 24 19:51 four-genomes.aa.sc -rw-r--r-- 1 eedsinger mnlsc 15 Apr 24 10:57 log

AND the following .pan summary:

statistic of core, shared and specific genes: Feature core shared specific taxon Number 0 0 100337 20372

ω(core size of pan-genome) and 95% confidence interval: κc τc ω 0.0005322479499826706±0.016426296185872956 0.24684946956898285±0.9401083888361351 9.880204357880838e-08±3.034461244500993e-10

θ(new gene number for each new sequenced genome) and 95% confidence interval: κs τs tg(θ) 0.0013791569662670866±0.0 0.008753804867384085±0.0 5.488663081012105±0.7739328979843694

κ(size and openess of pan-genome, open if γ > 0) and 95% confidence interval: κ γ 1.6535389090093497±0.0338922355775147 1.11496782984238±0.0021317893140741055

AND I got the following std out / std error log report:

nohup: ignoring input nohup: ignoring input /home/eedsinger/software/swiftortho/SwiftOrtho/scripts/pan_genome.py:210: DeprecationWarning: scipy.asarray is deprecated and will be removed in SciPy 2.0.0, use numpy.asarray instead mat = np.asarray(fp, dtype='bool') /home/eedsinger/software/swiftortho/SwiftOrtho/scripts/pan_genome.py:211: DeprecationWarning: scipy.asarray is deprecated and will be removed in SciPy 2.0.0, use numpy.asarray instead mat = np.asarray(mat, dtype='int8') /home/eedsinger/software/swiftortho/SwiftOrtho/scripts/pan_genome.py:290: DeprecationWarning: scipy.asarray is deprecated and will be removed in SciPy 2.0.0, use numpy.asarray instead ys = np.asarray(x[:, [elem[0] for elem in idxs]] > 0, 'int32') /home/eedsinger/software/swiftortho/SwiftOrtho/scripts/pan_genome.py:307: DeprecationWarning: scipy.asarray is deprecated and will be removed in SciPy 2.0.0, use numpy.asarray instead yn = np.asarray(x[:, [elem[i] for elem in idxs]] > 0, 'int32') /home/eedsinger/software/swiftortho/SwiftOrtho/scripts/pan_genome.py:311: DeprecationWarning: scipy.asarray is deprecated and will be removed in SciPy 2.0.0, use numpy.asarray instead sp = np.asarray(evaluate('(ys <= Ts) & (yn > 0)'), dtype='int8') /home/eedsinger/software/swiftortho/SwiftOrtho/scripts/pan_genome.py:316: DeprecationWarning: scipy.asarray is deprecated and will be removed in SciPy 2.0.0, use numpy.asarray instead cr = np.asarray(evaluate('ys >= Tc'), dtype='int8') /home/eedsinger/software/swiftortho/SwiftOrtho/scripts/pan_genome.py:322: DeprecationWarning: scipy.asarray is deprecated and will be removed in SciPy 2.0.0, use numpy.asarray instead pa = np.asarray(evaluate('ys > 0'), dtype='int8') /home/eedsinger/software/swiftortho/SwiftOrtho/scripts/pan_genome.py:449: DeprecationWarning: scipy.asarray is deprecated and will be removed in SciPy 2.0.0, use numpy.asarray instead x, y = list(map(np.asarray, [X, Y])) /home/eedsinger/software/swiftortho/SwiftOrtho/scripts/pan_genome.py:408: DeprecationWarning: scipy.exp is deprecated and will be removed in SciPy 2.0.0, use numpy.exp instead return K_c np.exp(-n / Tau_c) + Omega /home/eedsinger/software/swiftortho/SwiftOrtho/scripts/pan_genome.py:468: DeprecationWarning: scipy.diag is deprecated and will be removed in SciPy 2.0.0, use numpy.diag instead conf = [tval elem * .5 for elem in np.diag(pcov)] /home/eedsinger/software/swiftortho/SwiftOrtho/scripts/pan_genome.py:415: DeprecationWarning: scipy.exp is deprecated and will be removed in SciPy 2.0.0, use numpy.exp instead return K_s np.exp(-n / Tau_s) + TgTheta Traceback (most recent call last): File "/home/eedsinger/software/swiftortho/SwiftOrtho/scripts/rbh2phy.py", line 254, in L = len(''.join(list(tree.values())[0])) IndexError: list index out of range

ERROR: Alignment not loaded: "four-genomes.aa_results/four-genomes.aa.aln" Check the file's content.

Cannot read four-genomes.aa_results/four-genomes.aa.aln.trim all to all homologous searching time: 32001.47734451294 orthomcl algorithm time: 241.9926917552948 use apc to group protein family time: 32.674267292022705 pan-genome analysis time: 2764.5215167999268 species tree construction time: 1210.0141377449036

Please let me know if any other details would be helpful.

Thank you!

000generic commented 4 years ago

I got things working - there was an error in the header structure of one of the species, where the species included gene symbol info and not the intended internal genus species code (evident above) - making things perform as if there were over 20,000 species.

Sorry for the trouble - thank you :)

Rinoahu commented 4 years ago

You are welcome. BTW, could you please close the issue if you solved the problem?