Closed LeoBusse closed 3 years ago
If your image is anything to go by, looks like your sequences have gaps (B starts with -) in them - I'll have to add them to the extended set. In the meantime, you could do a search and replace for the gap characters with X, or just delete them (it might only be a few rogue sequences, which is usually the case). I'll try and get a fix for this soon.
Looks like I also forgot to remove some logging calls there so thanks for reminding me haha.
Thank you so much!
I looked for the gaps as suggested and it works perfectly now! I really appreciate the advice.
Hello, thank you for developing a great tool!
I've been trying to figure out where the "ValueError: sequence contains letters not in the alphabet" error is coming from when I run my .gbf files/.gb files through Clinker. I went through issue #68 and I installed Clinker 0.0.21 through Conda again but to no avail. I have also tried the pip install but that didn't fix the problem. I double checked the align.py script on my local computer and it has the extend_matrix_alphabet addition, so I'm not sure what to do. You mentioned a quick fix would be to go through the sequence and delete anything not part of the extended IUPAC. Is there a particular way you recommend doing this? I have several sequences, so it seems like it would take a long time to identify anything wrong in the sequence (I would be looking for numbers, right?).
I attached an image with the traceback in case it's helpful.
Thank you so much!