BioSina / Panakeia

Prokaryotic Pangenome Analysis
GNU General Public License v3.0
24 stars 3 forks source link

Cluster IndexError #2

Open jhcuarta opened 3 years ago

jhcuarta commented 3 years ago

Hi I'm running Panakeia and i encounter and error, i guess is related to clustering, since I also got an error doing it

513357 finished 18243 clusters

Approximated maximum memory consumption: 535M writing new database writing clustering information program completed !

Total CPU time 365.00 Traceback (most recent call last): File "Clustering.py", line 174, in main() File "Clustering.py", line 139, in main filter_clusters("cdhit90.clstr", "proteins90.fasta") File "Clustering.py", line 42, in filter_clusters protein_dict.pop(c) KeyError: 'CALIBMGJ_00459'

When I try to run the main script this is what i got

python3 Panakeia.py /home/jason/Documentos/Panakeia_vibrio /home/jason/Documentos/Panakeia_vibrio/gff /home/jason/Documentos/Panakeia_vibrio/cdhit90 /home/jason/Documentos/Panakeia_vibrio/vibrio.faa Traceback (most recent call last): File "Panakeia.py", line 797, in main() File "Panakeia.py", line 779, in main read_clusters(args.clust, args.rep) File "Panakeia.py", line 65, in read_clusters left_locus = re.sub('^ ', "", preid[1]) IndexError: list index out of range

Not sure about rep file requirements or extension, when i use a folder or invoke all files using * i also got an error for every file

python3 Panakeia.py /home/jason/Documentos/Panakeia_vibrio /home/jason/Documentos/Panakeia_vibrio/gff /home/jason/Documentos/Panakeia_vibrio/cdhit90 /home/jason/Documentos/Panakeia_vibrio/faa/*.faa

Panakeia.py: error: unrecognized arguments:

Regards

BioSina commented 3 years ago

This looks like the headers in your protein FastA file did not match the GFF files. Which then would mean your clustering result is empty or misses the proteins which did not match, what will lead to the second error.

jhcuarta commented 3 years ago

Hi So, what can I do. I got protein and gff from prokka, guess no inconsistencies between extensions

javiefernandez commented 1 year ago

Hi everyone, I encountered the same issue. I fixed it simply by adding a default value in case of failure of the pop command due to KeyError (see below).

41 for c in cluster: 42 protein_dict.pop(c, 'nothing') 43 cluster = list()

Hope it helps @jhcuarta. Best regards, Javier