JLSteenwyk / ClipKIT

a multiple sequence alignment-trimming algorithm for accurate phylogenomic inference
https://jlsteenwyk.com/ClipKIT/
MIT License
63 stars 4 forks source link

remove gaps and codon[YYYY-MM-DD]: [BUG TITLE] #45

Closed nbat64 closed 7 months ago

nbat64 commented 7 months ago

Hello,

I am interested to use your software to clean some alignment. I wanted to trim colunm with more than 90% of gaps/missing data and by keeping codon.

I tried the following command

clipkit infile -m gappy --gaps 0.9 --codon --output outfile

I ahve this error

IndexError: arrays used as indices must be of integer (or boolean) type

When I used only --gaps 0.9 the gap thresold is 0.4138. I am only able to have a threshold of 0.9 by use only the argument -m gappy. (-m gappy plus --codon doesn't work).

Is it a bug known?

I am using the last version 2.2.4 installed by conda.

thanks for the help,

regards

Nicolas

JLSteenwyk commented 7 months ago

Hi Nicolas,

Thank you for using ClipKIT!

Would you please provide your input file so that I can further diagnose the issue?

All the best,

Jacob

nbat64 commented 7 months ago

Hi Jacob, Thanks here is an input file as example. ENSG00000000003_TSPAN6_GuidancePRANK_algn_raw.fasta.txt


Traceback (most recent call last):
  File "/soft/2019013/Conda_env/edit_env/bin/clipkit", line 10, in <module>
    sys.exit(main())
  File "/soft/2019013/Conda_env/edit_env/lib/python3.10/site-packages/clipkit/clipkit.py", line 200, in main
    execute(**process_args(args))
  File "/soft/2019013/Conda_env/edit_env/lib/python3.10/site-packages/clipkit/clipkit.py", line 147, in execute
    trim_run, stats = run(
  File "/soft/2019013/Conda_env/edit_env/lib/python3.10/site-packages/clipkit/clipkit.py", line 103, in run
    msa.trim(mode, gap_threshold=gaps, site_positions_to_trim=None, codon=codon)
  File "/soft/2019013/Conda_env/edit_env/lib/python3.10/site-packages/clipkit/msa.py", line 129, in trim
    self._site_positions_to_keep = np.delete(
  File "<__array_function__ internals>", line 200, in delete
  File "/soft/2019013/Conda_env/edit_env/lib/python3.10/site-packages/numpy/lib/function_base.py", line 5235, in delete
    keep[obj,] = False
IndexError: arrays used as indices must be of integer (or boolean) type

I tried this command:

clipkit ENSG00000000003_TSPAN6_GuidancePRANK_algn_raw.fasta -m gappy --gaps 0.9 --codon -o test.out

Thanks

regards

JLSteenwyk commented 7 months ago

Hi Nicolas,

I am not getting an error when using the latest version of ClipKIT, version 2.2.4.

What version are you using? Do you mind trying the latest version and letting me know if the error persists?

best,

Jacob

nbat64 commented 7 months ago

Hum weird. It is the version 2.2.4. I will install it in it's own environment to test. Thanks, Nicolas

nbat64 commented 7 months ago

with a new conda env with only installed clipkit 2.2.4 (mamba install bioconda::clipkit=2.2.4)

Traceback (most recent call last):
  File "/soft/2019013/Conda_env/clipkit_env/bin/clipkit", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/soft/2019013/Conda_env/clipkit_env/lib/python3.12/site-packages/clipkit/clipkit.py", line 200, in main
    execute(**process_args(args))
  File "/soft/2019013/Conda_env/clipkit_env/lib/python3.12/site-packages/clipkit/clipkit.py", line 147, in execute
    trim_run, stats = run(
                      ^^^^
  File "/soft/2019013/Conda_env/clipkit_env/lib/python3.12/site-packages/clipkit/clipkit.py", line 103, in run
    msa.trim(mode, gap_threshold=gaps, site_positions_to_trim=None, codon=codon)
  File "/soft/2019013/Conda_env/clipkit_env/lib/python3.12/site-packages/clipkit/msa.py", line 129, in trim
    self._site_positions_to_keep = np.delete(
                                   ^^^^^^^^^^
  File "/soft/2019013/Conda_env/clipkit_env/lib/python3.12/site-packages/numpy/lib/function_base.py", line 5354, in delete
    keep[obj,] = False
    ~~~~^^^^^^
IndexError: arrays used as indices must be of integer (or boolean) type

Is it supposed to work with python 3.12?

JLSteenwyk commented 7 months ago

Hi Nicolas,

Supporting Python versions 3.11 & 3.12 is top of our to-do list. However, I haven't tested it on 3.12, so I can't say with 100% certainty.

Are you working on a cluster? It is probably possible to load python version 3.9 or 3.10, which are supported. Would that solution work for you?

best,

Jacob

nbat64 commented 7 months ago

Yes I am on a cluster. I will try with an other python version and let you know. (the recipy from bioconda install python 3.12.2, python_abi 3.12 and biopython 1.83). Best, Nicolas

nbat64 commented 7 months ago

Hi Jacob, I have the same error with clipkit 2.24 and python 3.9.19 (conda env).


Traceback (most recent call last):
  File "/soft/2019013/Conda_env/clipkit_env/bin/clipkit", line 10, in <module>
    sys.exit(main())
  File "/soft/2019013/Conda_env/clipkit_env/lib/python3.9/site-packages/clipkit/clipkit.py", line 200, in main
    execute(**process_args(args))
  File "/soft/2019013/Conda_env/clipkit_env/lib/python3.9/site-packages/clipkit/clipkit.py", line 147, in execute
    trim_run, stats = run(
  File "/soft/2019013/Conda_env/clipkit_env/lib/python3.9/site-packages/clipkit/clipkit.py", line 103, in run
    msa.trim(mode, gap_threshold=gaps, site_positions_to_trim=None, codon=codon)
  File "/soft/2019013/Conda_env/clipkit_env/lib/python3.9/site-packages/clipkit/msa.py", line 129, in trim
    self._site_positions_to_keep = np.delete(
  File "/soft/2019013/Conda_env/clipkit_env/lib/python3.9/site-packages/numpy/lib/function_base.py", line 5354, in delete
    keep[obj,] = False
IndexError: arrays used as indices must be of integer (or boolean) type
JLSteenwyk commented 7 months ago

Hi Nicolas,

I have diagnosed the issue and am no longer getting the error as of version 2.2.5, which is available via PyPi.

Thank you again for using ClipKIT! Some of our other software may also interest you - see here: [https://jlsteenwyk.com/software.html](https://jlsteenwyk.com/software.html).

Happy coding!

All the best,

Jacob