Closed carolynzy closed 2 years ago
Hi @carolynzy , did you use option -c ?
No. I didn't use -c.
Will have a look later in the day
Hi @carolynzy , there were two issues here:
1) The code was taking only the default number of hits reported by BLASTP, now it takes as many as sequences in the cluster. See https://github.com/eead-csic-compbio/get_homologues/commit/561894a4809730b078a5ab49e5d9656df5914bce
2) The sequences in your sample cluster have redundant names, see:
perl -lne 'if(/^>(\S+)/){ print $1 }' 1228009_ubiquitin-like_prote.txt | sort -u |wc
355 355 4263
I have commited the changes, you should take care of sequence names on your side to resolve this limitation, Bruno
@eead-csic-compbio Thank you! I do have changed the name but don't know why I still uploaded the original version. Thank you very much!
Hi, I'm using annotate_cluster.pl on my clusters while I noticed a strange thing. Take cluster 1228009 for example, there are 421 sequences in this cluster. Every sequence has 64 aa. They are almost identical. However, when using annotate_cluster.pl, only 251 sequences would be aligned. I read the manual which said some short fragments could be left out due to not aligned to the longest sequence, which is not my case I think. log.annotate_cluster.1228009.txt 1228009_ubiquitin-like_prote.txt
I attached the fasta file as well as the log file. Would you please check this issue? Thank you!
P.S. I took a further look and found that it seems no matter how many sequences in the cluster, only a maximum of 251 sequences will be aligned despite the sequences are highly similar.