Closed reslp closed 4 years ago
Hello Philipp,
First off, thank-you very much for pointing this out. This was a bug we have missed to date so I am glad you mentioned something about it.
Please download the latest update to cazy_extract.pl and replace your version with this one.
Basically what was going on is if Cazy has the same Accession ID on 2 different pages the duplicate screen of the script was missing that duplicate and hence you were getting the issue you have now.
This of course would have been detected sooner in one of our earlier versions, however due to name length restrictions we rename the headers to something unique right after the extract and this name does not revert till after Muscle.
Please test out this copy and let me know what you find. I will keep this issue open until you are satisfied things are working.
Thanks Dallas
Hi Dallas,
Thank you for the fast reply. Your fix seems to work fine. Thank you also for your explanation. I will do some additional tests but so far it runs smoothly. In case I come across the problem again I will reopen this thread.
Many thanks again!
all the best, Philipp
Hi,
first of all thank you for SACCHARIS, it really makes it easier to characterize CAZymes. I have been using it quite a bit lately and for most families Saccharis runs fine. Now I have come across a problem when analysing GH11. This family contains characterized sequences which have identical sequence names (CAA46498). Most steps of Saccharis run but when it comes to tree reconstruction it fails because of the identical names. I know it would be easy to manually remove the sequence from the alignment, however my workflow is highly automated because I analyse lots of cazyme families with thousands of additional sequences which could also be cazymes and it is difficult to predict for which families this would happen too. I wanted to ask if it would be possible for you to add a step which checks the alignments for identical sequence names and fixes them (or removes one of the sequences). I would also greatly appreciate any additional suggestion on how to fix this.
many thanks already in advance!
kind regards,
Philipp
Also, here is the relevant output from Saccharis (in this case without additional sequences to decrease runtime). The created treefile is empty.