Closed EmaBoni closed 2 years ago
Hi @EmaBoni, I haven't tried using clinker inside a Jupyter notebook before so I'm not exactly sure about that - from what you've listed there it looks like the command should be correct. Would it be possible to post the full error log from when you try to run the program inside the notebook or in the command line?
Hello @gamcil , thank you for your answer! Here is the full attempt from the command line (NB: it is windows command line, not linux, might this be an issue?). (Clinker) is my python environment created with Anaconda Navigator
`(Clinker) C:\Users\Emanuele Boni\clinker>python
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 05:59:00) [MSC v.1929 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.
import clinker clinker
<module 'clinker' from 'C:\Users\Emanuele Boni\clinker\clinker\init.py'>
import os os.listdir('C:\Users\Emanuele Boni\clinker\examples')
['A. alliaceus CBS 536.65.gbk', 'A. burnettii MST-FP2249.gbk', 'A. mulundensis DSM 5745.gbk', 'A. versicolor CBS 583.65.gbk', 'note.md', 'P. vexata CBS 129021.gbk']
clinker 'C:\Users\Emanuele Boni\clinker\examples\*' -p
File "
clinker 'C:\Users\Emanuele Boni\clinker\examples'+'/' -p File "
", line 1 clinker 'C:\Users\Emanuele Boni\clinker\examples'+'/ ' -p ^ SyntaxError: invalid syntaxclinker examples/ -p File "
", line 1 clinker examples/ -p ^ SyntaxError: invalid syntax`
NB: the '^' arrow points at the first character after clinker
Ah you are trying to run clinker from within the Python interactive shell, which is then recognising it as invalid syntax (since it isn't Python code). clinker should be run just from the command line itself - try exiting the Python shell and running the exact same command, e.g. clinker 'C:\Users\Emanuele Boni\clinker\examples\*' -p
and it should work.
This clarifies a lot, thank you! I managed to run the pipeline on the examples (resulting image is as expected) and on my files. This is what I get:
I am a bit uncertain about the result because I expected the sequences to have much higher identity (more groups matching, higher identity percentage for the group that is correctly recognized). The alignment is done on the protein sequences, is that correct? I will double check the gene sequences to make sure there are no errors there. Any idea of other things that I am not considering when aligning these two files?
Yes the alignments are done on the protein sequences - however if they are missing, clinker will try to translate the regions corresponding to gene/CDS coordinates in the input file. Not sure what is causing the issue in your case, would you be able to upload your files?
Ok! Yes, protein sequences are annotate in my files. Here are the two files that I am using: EBoni.zip
Thanks a lot for your time and for your help!
Just had some time to have a look at this. It seems the files are read in correctly (it is picking up the AA translations just fine), but the alignments are falling below the default identity threshold (30%) and so are getting filtered out. You can lower this threshold using the -i/--identity
argument, e.g. clinker EBoni/*.gb -i 0.2 -p
. That command gives me this:
Thanks a lot! We found 18-19% identity threshold was ideal for us. You have been extremely helpful, thanks again for your time and for this very useful tool! Kind regards, Emanuele
Sorry for the dumb question, I am trying to use clinker but I cannot analyze files within a folder.
I have cloned clinker from git and installed it via pip, as indicated in the readme, so now I have a clinker folder that includes the examples. I have created a dedicated environment and I am working in Jupyter notebook. This is the code that I am using:
import sys
sys.path.append('C:\\Users\Emanuele Boni\clinker')
import clinker
import os
proj_dir = 'C:\\Users\Emanuele Boni\clinker\examples'
os.listdir(proj_dir)
This returns the content of the folder as
['A. alliaceus CBS 536.65.gbk', 'A. burnettii MST-FP2249.gbk', 'A. mulundensis DSM 5745.gbk', 'A. versicolor CBS 583.65.gbk', 'note.md', 'P. vexata CBS 129021.gbk']
However, if I try to run
clinker proj_dir/* -p
I get the error messageSyntaxError: invalid syntax
pointing at proj_dirI have tried several things: creating a subfolder in the folder where I am running the notebook, writing the folder name as string and as variable (with and without quotes), running the lines of code directly from the command line instead of inside the notebook. None of these worked. I think I am not considering something very trivial, but I cannot figure out what it is.
Thank you for your help and for developing this tool! Emanuele