fmalmeida / bacannot

Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.
https://bacannot.readthedocs.io/en/latest/
GNU General Public License v3.0
98 stars 9 forks source link

TypeError raised by falmedia.py #84

Closed wonitawowowo closed 1 year ago

wonitawowowo commented 1 year ago

TypeError raised by falmedia.py Thanks for sharing you pipline. It quiet takes me a lot of convenience during RNA_seq anaylsis. But there might be a mistake in the specific state when call BACANNOT:SUMMARY. When the when the ID of chromosome are number ,during this process, the function named stringify_keys() definite by bacannot2json.py will be called to change the numerucal ID to the string. But it seemd that stringify_keys() can't achieve sometimes. To Reproduce Steps to reproduce the behavior:

  1. Suposse I have a dict as : dct = {'platon': {'total': 3, 22: {'Length': 40291, 'ORFs': 48, 'Circular': 'no', 'AMRs': 0, 'Replication': 1, 'Mobilization': 0, 'Conjugation': 0}, 38: {'Length': 8869, 'ORFs': 11, 'Circular': 'no', 'AMRs': 0, 'Replication': 0, 'Mobilization': 0, 'Conjugation': 0}, 41: {'Length': 4868, 'ORFs': 5, 'Circular': 'no', 'AMRs': 0, 'Replication': 1, 'Mobilization': 1, 'Conjugation': 0}}} The when I use stringify_keys(dct), number 22 and 38 converted scussed, but the code for key in d.keys(): return the value '22' as key. It might because of changing dict's keys during iteration will take some problem, the 41 did not be converted to '44'.
  2. Then function _stringify_key in encoder.py would check the keys, and found the numerical variable 41,and raised error,just as: (falmeida-py) root@ac8519d54d79:/media/kangzong/work_fold/SZU_Hospital/work/1c/f5cdfc799ec2d7cbd1ce6891dcedef# falmeida-py bacannot2json -i results -o S29132_BDMS190048659-1a_fastp_1.fq.gz_summary.json Traceback (most recent call last): File "/opt/conda/envs/falmeida-py/bin/falmeida-py", line 33, in sys.exit(load_entry_point('falmeida-py==0.9', 'console_scripts', 'falmeida-py')()) File "/opt/conda/envs/falmeida-py/lib/python3.7/site-packages/falmeida_py/main.py", line 212, in main bacannot2json(args['--input'], args['--output'], args['--print']) File "/opt/conda/envs/falmeida-py/lib/python3.7/site-packages/falmeida_py/bacannot2json.py", line 142, in bacannot2json ignore_nan=True File "/opt/conda/envs/falmeida-py/lib/python3.7/site-packages/simplejson/init.py", line 412, in dumps **kw).encode(obj) File "/opt/conda/envs/falmeida-py/lib/python3.7/site-packages/simplejson/encoder.py", line 298, in encode chunks = list(chunks) File "/opt/conda/envs/falmeida-py/lib/python3.7/site-packages/simplejson/encoder.py", line 714, in _iterencode for chunk in _iterencode_dict(o, _current_indent_level): File "/opt/conda/envs/falmeida-py/lib/python3.7/site-packages/simplejson/encoder.py", line 666, in _iterencode_dict for chunk in chunks: File "/opt/conda/envs/falmeida-py/lib/python3.7/site-packages/simplejson/encoder.py", line 666, in _iterencode_dict for chunk in chunks: File "/opt/conda/envs/falmeida-py/lib/python3.7/site-packages/simplejson/encoder.py", line 666, in _iterencode_dict for chunk in chunks: File "/opt/conda/envs/falmeida-py/lib/python3.7/site-packages/simplejson/encoder.py", line 609, in _iterencode_dict k = _stringify_key(k) File "/opt/conda/envs/falmeida-py/lib/python3.7/site-packages/simplejson/encoder.py", line 571, in _stringify_key 'not %s' % key.class.name) TypeError: keys must be str, int, float, bool or None, not int64

Then the workflow will end.

Expected behavior I think the the changed dict can be assigned to a new value to aviod this problem,just as def convert_dictkey(d):

change all keys in a dict d

return {str(k): convert_dictvalue(v) for k,v in d.items()}

def convert_dictvalue(v):

if v is a dict do convert_dictkey() for v, else raise v

if isinstance(v, dict):
    return convert_dictkey(v)
else:
    return v

def stringify_keys2(d):

for test

return convert_dictkey(d)

then replace stringify_keys with stringify_keys2 in function bacannot2json, it looks work will: (falmeida-py) root@ac8519d54d79:/media/kangzong/work_fold/SZU_Hospital/work/1c/f5cdfc799ec2d7cbd1ce6891dcedef# falmeida-py bacannot2json -i results -o S29132_BDMS190048659-1a_fastp_1.fq.gz_summary.json ==> Output generated and saved at: S29132_BDMS190048659-1a_fastp_1.fq.gz_summary.json

fmalmeida commented 1 year ago

Hi @wonitawowowo ,

Thanks for using the tool and for sharing the issue. Also, thanks for sharing it in a very detailed manner even with some possible solutions to it.

Tonight, I will generate a genome having numbers as IDs to try to replicate this issue and start trying to solve it. I hope it that by the end of friday I may already have a solution for it. In t he meantime, to avoid having the pipeline failing, I would suggest having something as contig_1 instead of 1 as contig names.

But, this is just a workaround for now while I work on solving the issue you've shared. 😄

fmalmeida commented 1 year ago

Hi @wonitawowowo , Can you download the docker image again and try it once more? Since the change is not related to the pipeline but rather in a python package of mine, I just updated the docker image to install the latest version of it.

docker pull fmalmeida/bacannot:v3.2_pyenv

wonitawowowo commented 1 year ago

Hi @fmalmeida , I pull down the last version and rerun the pipline with resume. It seems finished with not error. Thank you.

fmalmeida commented 1 year ago

Thanks for the feedback. I will close the issue then.