TypeError: list indices must be integers or slices, not str in ckanext-fluent with zh_CN language

kumarvivek1752 commented 11 months ago

CKAN Version: 2.10.1

CKAN Extensions Installed: ckanext-fluent , ckanext-schemming

Description:

When testing ckanext-fluent with the zh_CN language, I encountered a TypeError: list indices must be integers or slices, not str error. This error does not occur with other languages.

The error occurs in the unflatten function in ckan/lib/navl/dictization_functions.py on the line current_pos = current_pos[key]. Here, current_pos is a list and key is likely a string, which leads to the TypeError.

Here is the traceback for the error:


Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/flask/app.py", line 2076, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/lib/python3.10/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/lib/python3.10/site-packages/flask/app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/lib/python3.10/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/lib/python3.10/site-packages/flask_debugtoolbar/__init__.py", line 142, in dispatch_request
    return view_func(**req.view_args)
  File "/usr/lib/python3.10/site-packages/flask/views.py", line 84, in view
    return current_app.ensure_sync(self.dispatch_request)(*args, **kwargs)
  File "/usr/lib/python3.10/site-packages/flask/views.py", line 158, in dispatch_request
    return current_app.ensure_sync(meth)(*args, **kwargs)
  File "/srv/app/src/ckan/ckan/views/dataset.py", line 585, in post
    pkg_dict = get_action(u'package_create')(context, data_dict)
  File "/srv/app/src/ckan/ckan/logic/__init__.py", line 551, in wrapped
    result = _action(context, data_dict, **kw)
  File "/srv/app/src/ckan/ckan/logic/action/create.py", line 192, in package_create
    data, errors = lib_plugins.plugin_validate(
  File "/srv/app/src/ckan/ckan/lib/plugins.py", line 327, in plugin_validate
    result = plugin.validate(context, data_dict, schema, action)
  File "/srv/app/src/ckanext-scheming/ckanext/scheming/plugins.py", line 312, in validate
    return navl_validate(data_dict, schema, context)
  File "/srv/app/src/ckan/ckan/lib/navl/dictization_functions.py", line 313, in validate
    errors_unflattened = unflatten(errors)
  File "/srv/app/src/ckan/ckan/lib/navl/dictization_functions.py", line 464, in unflatten
    current_pos = current_pos[key]
TypeError: list indices must be integers or slices, not str

pdb data :

> /srv/app/src/ckan/ckan/views/dataset.py(585)post()
-> pkg_dict = get_action(u'package_create')(context, data_dict)
(Pdb) data_dict
{'_ckan_phase': 'dataset_new_1', 'name': 'hk', 'notes_translated-ar': 'hh', 'notes_translated-en': 'hh', 'notes_translated-es': 'dd', 'notes_translated-fr': 'hh', 'notes_translated-ru': 'ss', 'notes_translated-zh_CN': 'nvffdf', 'owner_org': 'efc0f9e2-9f3b-4f27-a65b-7c7378e7496d', 'pkg_name': '', 'private': 'False', 'save': '', 'tag_string': '', 'title_translated-ar': 'ar', 'title_translated-en': 'en', 'title_translated-es': 'es', 'title_translated-fr': 'fr', 'title_translated-ru': 'ru', 'title_translated-zh_CN': 'hhh', 'tags': [], 'state': 'draft', 'type': 'dataset'}
(Pdb) context
{'model': <module 'ckan.model' from '/srv/app/src/ckan/ckan/model/__init__.py'>, 'session': <sqlalchemy.orm.scoping.scoped_session object at 0x7fddb4d76290>, 'user': 'ckan_admin', 'auth_user_obj': <User id=3b6ff121-07c5-4cd1-aa79-545ca25637c4 name=ckan_admin password=$pbkdf2-sha512$25000$Zcx5b02JcQ6hVMq59763Ng$I7nvpFR297AmMo25pokhQQ2XLwlgFZPob6WuY7wEU5eHhI.gOrgXCb0VWhl3s4wX6fgYE71KsMHILCSKQMa2Jg fullname=None email=your_email@example.com apikey=None created=2023-10-22 03:18:57.577162 reset_key=None about=None last_active=2023-10-23 00:08:25.250856 activity_streams_email_notifications=False sysadmin=True state=active image_url=None plugin_extras=None>, 'save': True, '__auth_user_obj_checked': True, 'allow_partial_update': True, 'allow_state_change': True}    
(Pdb) get_action(u'package_create')(context, data_dict)
*** TypeError: list indices must be integers or slices, not str
(Pdb) continue

wardi commented 11 months ago

ckan's validation code assumes that _ is a field separator when "unflattening" form field names. For ckanext-fluent to support languages like zh_CN we'll need to have ckanext-fluent convert the _s to something else when generating the form field names, and convert them back when storing them as json.

kumarvivek1752 commented 11 months ago

can you guide me on where I've to make changes to support languages that have _.

wardi commented 11 months ago

I think the lang _ values need to be replaced in the form: https://github.com/ckan/ckanext-fluent/blob/e882c241c57f80ea4e4f1d72f07f2dd64588310e/ckanext/fluent/templates/scheming/form_snippets/fluent_text.html#L5C1

And when being parsed: https://github.com/ckan/ckanext-fluent/blob/e882c241c57f80ea4e4f1d72f07f2dd64588310e/ckanext/fluent/validators.py#L138

kumarvivek1752 commented 11 months ago

thanks for quick reply let me debug it

kumarvivek1752 commented 11 months ago

it is giving invalid language code: for both zh_Hans_CN and zh_CN

pdb (zh_CN):

> /srv/app/src/ckan/ckan/lib/navl/dictization_functions.py(305)validate()
-> flat_data, errors = _validate(flattened, schema, validators_context)
(Pdb) _validate(flattened, schema, validators_context)
({('name',): 'vghv', ('owner_org',): '1ab37f13-bf77-43a3-a708-8410d6f18496', ('private',): True, ('tag_string',): '', ('state',): 'draft', ('type',): 'dataset', ('title',): 'vghv', ('title_translated',): <ckan.lib.navl.dictization_functions.Missing object at 0x7f9fed501cc0>, ('notes_translated',): <ckan.lib.navl.dictization_functions.Missing object at 0x7f9fed501cc0>, ('extras', 0, 'key'): 'notes_translated', ('extras', 0, 'value'): <ckan.lib.navl.dictization_functions.Missing object at 0x7f9fed501cc0>, ('extras', 1, 'key'): 'title_translated', ('extras', 1, 'value'): <ckan.lib.navl.dictization_functions.Missing object at 0x7f9fed501cc0>}, {('__before',): [], ('id',): [], ('name',): [], ('title',): [], ('author',): [], ('author_email',): [], ('maintainer',): [], ('maintainer_email',): [], ('license_id',): [], ('notes',): [], ('url',): [], ('version',): [], ('state',): [], ('type',): [], ('owner_org',): [], ('private',): [], ('__extras',): [], ('__junk',): [], ('tag_string',): [], ('plugin_data',): [], ('save',): [], ('return_to',): [], ('title_translated',): [], ('notes_translated',): [], 'notes_translated-zh_CN': ['invalid language code: "zh_CN"'], 'title_translated-zh_CN': ['invalid language code: "zh_CN"']})
(Pdb)

pdb(zh_Hans_CN):


> /srv/app/src/ckan/ckan/lib/navl/dictization_functions.py(305)validate()
-> flat_data, errors = _validate(flattened, schema, validators_context)
(Pdb) _validate(flattened, schema, validators_context)
({('name',): 'hbhjs', ('owner_org',): '1ab37f13-bf77-43a3-a708-8410d6f18496', ('private',): True, ('tag_string',): '', ('state',): 'draft', ('type',): 'dataset', ('title',): 'hbhjs', ('title_translated',): <ckan.lib.navl.dictization_functions.Missing object at 0x7f0fe1609cc0>, ('notes_translated',): <ckan.lib.navl.dictization_functions.Missing object at 0x7f0fe1609cc0>, ('extras', 0, 'key'): 'notes_translated', ('extras', 0, 'value'): <ckan.lib.navl.dictization_functions.Missing object at 0x7f0fe1609cc0>, ('extras', 1, 'key'): 'title_translated', ('extras', 1, 'value'): <ckan.lib.navl.dictization_functions.Missing object at 0x7f0fe1609cc0>}, {('__before',): [], ('id',): [], ('name',): [], ('title',): [], ('author',): [], ('author_email',): [], ('maintainer',): [], ('maintainer_email',): [], ('license_id',): [], ('notes',): [], ('url',): [], ('version',): [], ('state',): [], ('type',): [], ('owner_org',): [], ('private',): [], ('__extras',): [], ('__junk',): [], ('tag_string',): [], ('plugin_data',): [], ('save',): [], ('return_to',): [], ('title_translated',): [], ('notes_translated',): [], 'notes_translated-zh_Hans_CN': ['invalid language code: "zh_Hans_CN"'], 'title_translated-zh_Hans_CN': ['invalid language code: "zh_Hans_CN"']})        
(Pdb)

cicerobcastro commented 6 months ago

Hello guys,

Maybe I found a solution to this case.

ckanext-fluent/ckanext/fluent/validators.py Line 16 BCP_47_LANGUAGE = u'^[a-z]{2,8}(-[0-9a-zA-Z]{1,8})*$'

Need to change the expression to accept the '_' too.

BCP_47_LANGUAGE = u'^[a-z]{2,8}([-_][0-9a-zA-Z]{1,8})*$'

After this change, everything works fine to me.

wardi commented 6 months ago

I guess we can't use BCP-47 for languages because the keys we're passed are locale codes(?) which are different. e.g. transifex supports these: https://explore.transifex.com/languages/ and they include suffixes like _TW.Big5 and @latin so we might need . and @ too.

wardi commented 6 months ago

Note that for our site we are using BCP-47 for things like en-t-fr to mark strings automatically translated, so I don't want to drop this completely. Maybe we can get away with accepting both styles of strings? Someone that knows more about localization could weigh in here.

wardi commented 6 months ago

Another thought: It looks like we could convert most of the locale codes ckan will use to BCP-47 for fluent and the API with a .replace('_', '-') before passing the code in. This way we're representing languages consistently in the API.

kumarvivek1752 commented 6 months ago

@wardi @cicerobcastro thanks for replying i already fix this issue by using just zh .

ckan / ckanext-fluent

TypeError: list indices must be integers or slices, not str in ckanext-fluent with zh_CN language #48