gamcil / clinker

Gene cluster comparison figure generator
MIT License
507 stars 66 forks source link

Error in align #68

Closed kabilov closed 3 years ago

kabilov commented 3 years ago

Hi Cameron!

I installed clinker v0.0.20 by using conda. When I ran soft for two genome which are downloaded from Genbank the clinker gives error. Is there any solution to this problem?

Best wishes, Marsel


[09:02:59] INFO - Starting clinker [09:02:59] INFO - Parsing files: [09:02:59] INFO - PB12_4term_CP048407.gbk /home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. warnings.warn( [09:03:03] INFO - T.marianensis_NC_014831.gbk [09:03:06] INFO - Starting cluster alignments [09:03:07] INFO - PB12_4term_CP048407 vs T.marianensis_NC_014831 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/multiprocessing/pool.py", line 51, in starmapstar return list(itertools.starmap(args[0], args[1])) File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/align.py", line 377, in _align_clusters aln = aligner.align(geneA.translation, geneB.translation) File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/Bio/Align/init.py", line 1592, in align score, paths = _aligners.PairwiseAligner.align(self, seqA, seqB) ValueError: sequence contains letters not in the alphabet """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/kabilov/anaconda3/envs/clinker/bin/clinker", line 10, in sys.exit(main()) File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/main.py", line 283, in main clinker( File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/main.py", line 135, in clinker globaligner = align.align_clusters(*clusters, cutoff=identity, jobs=jobs) File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/align.py", line 57, in align_clusters aligner.align_stored_clusters(cutoff, jobs=jobs) File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/site-packages/clinker/align.py", line 404, in align_stored_clusters alignments = pool.starmap(_align_clusters, pairs_to_align) File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/multiprocessing/pool.py", line 372, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "/home/kabilov/anaconda3/envs/clinker/lib/python3.9/multiprocessing/pool.py", line 771, in get raise self._value ValueError: sequence contains letters not in the alphabet

gamcil commented 3 years ago

Hi Marsel,

This happens when the protein translation sequence contains invalid letters, causing the aligner to error. I actually ran into this a while ago, and fixed the issue back then by extending the alphabet used by the aligner (https://github.com/gamcil/clinker/commit/345935a1ea645b9725966c51b1e4b333859a5980). Some of the protein sequences in your files must have something other than extended IUPAC codes in them (ACDEFGHIKLMNPQRSTVWYBXZJUO), so you'll have to check - the quick fix would be to delete/change anything in the sequence that is not a valid code from that list.

kabilov commented 3 years ago

Hi Cameron,

Do you think gb files from Genbank include invalid letters? Is there another possible reason?

gamcil commented 3 years ago

Oops. looks like the code for extending the matrix had a bug. Tried aligning your sequences - was crashing on one that contained a U, which is in the extended codes. Just pushed a fix that should resolve this - could you try updating to clinker 0.0.21?

kabilov commented 3 years ago

Thanks a lot, now everything works! When will the conda update happen?

gamcil commented 3 years ago

Glad that fixed it :)

Not sure, that should happen automatically - looks like it's ticked up to 0.0.21 now (https://bioconda.github.io/recipes/clinker-py/README.html).