cc-archive / cccatalog

[PROJECT TRANSFERRED] Mapping the commons towards an open ledger and cc search.
https://github.com/WordPress/openverse-catalog
MIT License
62 stars 60 forks source link

[Bug] JSONColumn class should not enforce ASCII charset #509

Closed mathemancer closed 3 years ago

mathemancer commented 3 years ago

Bug Description

Currently, the method JSONColumn.prepare_string defined in src/cc_catalog_airflow/dags/provider_api_scripts/common/storage/columns.py uses the default ensure_ascii=True when dumping the given object to a string. This is not necessary, since PostgreSQL (and all downstream machinery) can handle the full UTF-8 charset. Thus, we should set ensure_ascii=False in the json.dumps call within the JSONColumn.prepare_string method.

Expected behavior

We should be able to save strings containing non-ASCII characters.

mathemancer commented 3 years ago

Note that the implementer should include a test to make sure that the new functionality is correct.

Runanka commented 3 years ago

Let me try this

mathemancer commented 3 years ago

Let me try this

Go for it!

tushar912 commented 3 years ago

@Runanka are u working on this. If not can i take it up

mathemancer commented 3 years ago

@tushar912 I think you can go for it. It's been quite some time, and it's a pretty quick issue.