TGAC / KAT

The K-mer Analysis Toolkit (KAT) contains a number of tools that analyse and compare K-mer spectra.
http://www.earlham.ac.uk/kat-tools
GNU General Public License v3.0
200 stars 51 forks source link

kat_distanalysis.py #128

Closed AmaliT closed 4 years ago

AmaliT commented 5 years ago

Hi

We installed Kat (version 2.4.1) through conda environments and the main package worked quite well. However we noticed that "kat_distanalysis.py" is missing from the installation. Could you please advise as to why this might be?

Cheers Amali

AntoineHo commented 4 years ago

Hello, I ran into the same problem, I found the script in this folder :

/home/user/anaconda3/envs/env_name/lib/python3.6/local/kat/distanalysis.py

However when I tried to use it I got this error:

Traceback (most recent call last):
  File "/home/user/anaconda3/envs/kat/lib/python3.6/local/kat/distanalysis.py", line 13, in <module>
    from .spectra import KmerSpectra, GCSpectra
ModuleNotFoundError: No module named '__main__.spectra'; '__main__' is not a package

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/anaconda3/envs/kat/lib/python3.6/local/kat/distanalysis.py", line 15, in <module>
    from kat.spectra import KmerSpectra, GCSpectra
ModuleNotFoundError: No module named 'kat'
dcopetti commented 4 years ago

Same here. I found the python script, tried running it directly:

(py36) bash-4.2$ /home/copettid/anaconda3/pkgs/kat-2.4.1-py36h355e19c_3/lib/python3.6/local/kat/distanalysis.py --plot Rab1_flye191004_kat23-main.mx
Traceback (most recent call last):
  File "/home/copettid/anaconda3/pkgs/kat-2.4.1-py36h355e19c_3/lib/python3.6/local/kat/distanalysis.py", line 13, in <module>
    from .spectra import KmerSpectra, GCSpectra
ModuleNotFoundError: No module named '__main__.spectra'; '__main__' is not a package

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/copettid/anaconda3/pkgs/kat-2.4.1-py36h355e19c_3/lib/python3.6/local/kat/distanalysis.py", line 15, in <module>
    from kat.spectra import KmerSpectra, GCSpectra
ModuleNotFoundError: No module named 'kat'

and gives an error also by adding pythonbefore the path.

How can we run this analysis?

jonwright99 commented 4 years ago

The script is called distanalysis.py and can be found in KAT/scripts/kat. The script uses the modules peak.py and spectra.py that are found in the same directory. I think the script is in the wrong directory and should reside in KAT/scripts so the modules are correctly referenced as kat/spectra.py etc. My version worked when I moved distanalysis.py up a directory so let me know if it works for you. If it does, I'll amend the documentation and file structure in the repo. Best, Jon

dcopetti commented 4 years ago

Hi Jon, Thanks for helping with this.

I installed kat from bioconda:

$ kat --version
kat 2.4.1

and I don't have a /KAT (all caps) folder: $ locate KAT | more also, there is no /KAT/scripts/kat folder as well, and the three scripts are in two/kat/ folders:

$ locate distanalysis.py
/home/copettid/anaconda3/envs/py36/lib/python3.6/local/kat/distanalysis.py
/home/copettid/anaconda3/pkgs/kat-2.4.1-py36h355e19c_3/lib/python3.6/local/kat/distanalysis.py
$ locate peak.py
/home/copettid/anaconda3/envs/py36/lib/python3.6/local/kat/peak.py
/home/copettid/anaconda3/envs/py36/lib/python3.6/site-packages/skimage/feature/peak.py
/home/copettid/anaconda3/envs/py36/lib/python3.6/site-packages/skimage/feature/tests/test_peak.py
/home/copettid/anaconda3/lib/python3.7/site-packages/skimage/feature/peak.py
/home/copettid/anaconda3/lib/python3.7/site-packages/skimage/feature/tests/test_peak.py
/home/copettid/anaconda3/pkgs/kat-2.4.1-py36h355e19c_3/lib/python3.6/local/kat/peak.py
/home/copettid/anaconda3/pkgs/scikit-image-0.15.0-py36he6710b0_0/lib/python3.6/site-packages/skimage/feature/peak.py
/home/copettid/anaconda3/pkgs/scikit-image-0.15.0-py36he6710b0_0/lib/python3.6/site-packages/skimage/feature/tests/test_peak.py
/home/copettid/anaconda3/pkgs/scikit-image-0.15.0-py37he6710b0_0/lib/python3.7/site-packages/skimage/feature/peak.py
/home/copettid/anaconda3/pkgs/scikit-image-0.15.0-py37he6710b0_0/lib/python3.7/site-packages/skimage/feature/tests/test_peak.py
$ locate spectra.py
/home/copettid/anaconda3/envs/py36/lib/python3.6/local/kat/spectra.py
/home/copettid/anaconda3/pkgs/kat-2.4.1-py36h355e19c_3/lib/python3.6/local/kat/spectra.py

I tried creating a /scripts folder, copying in there the files and running

$ mkdir  /home/copettid/anaconda3/envs/py36/lib/python3.6/local/kat/scripts
$ python /home/copettid/anaconda3/envs/py36/lib/python3.6/local/kat/scripts/distanalysis.py
Traceback (most recent call last):
  File "/home/copettid/anaconda3/envs/py36/lib/python3.6/local/kat/scripts/distanalysis.py", line 13, in <module>
    from .spectra import KmerSpectra, GCSpectra
ModuleNotFoundError: No module named '__main__.spectra'; '__main__' is not a package

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/copettid/anaconda3/envs/py36/lib/python3.6/local/kat/scripts/distanalysis.py", line 15, in <module>
    from kat.spectra import KmerSpectra, GCSpectra
ModuleNotFoundError: No module named 'kat'

but it does not work. Also, I don't have any /scripts folder inside the kat installation How should I deal with this different folder structure? Thanks!

jonwright99 commented 4 years ago

Can you try creating a directory called kat in the scripts folder that you just created, copy peak.py and spectra.py into that and run ./distanalysis.py <MX_FILE> from the scripts folder.

dcopetti commented 4 years ago

It worked, thanks Jon!

With the kat comp output as this KAT and the default stdout analysis saying I tried running it as ./distanalysis.py --plot -o Rab1_flye191004_kat23b_distanalysis --format png -z 76 -c 2 -p Rab1_flye191004_kat23b-main.mx and now I can't figure out how to interpret the numbers in the distanalysis file distanalysis_out.txt the total size of the genome has decreased by half (estimated in silico to 2.56 Gb per haploid, I am expecting a 5 Gb assembly and the one I used is 3.7 Gb) and there is only a 0x or 1x value per peak. Is that normal? Also, Though I used --plot, I don't get any image though the stdout says

Creating plots
--------------

Plotting K-mer frequency distributions for general spectra ... done.
Plotting K-mer frequency distributions for 0x ... done.
Plotting K-mer frequency distributions for 1x ... done.

gnuplot runs. Thanks for the help, Dario

jonwright99 commented 4 years ago

To be honest Dario, this is a script that was written a long time ago for use with simple kmer spectra. It basically calculates the area under the peaks as you have done previously but only works if the peaks are very clean. We never use it anymore and obviously needs more work to make it fit for production so it's probably best to avoid using it if possible.

dcopetti commented 4 years ago

OK then, I'll leave it out. Thanks Jon!