SimonbJohnson / quickX3

HXLDash. Create data visualisations quickly by leveraging the humanitarian exchange language
https://hxldash.com/
MIT License
8 stars 4 forks source link

Unicode support on titles and description #70

Closed fititnt closed 4 years ago

fititnt commented 4 years ago

TL;DR: Non-ASCII characters may display with errors. I suspect that this may be when the data is stored (did not dig the code yet)


Content of script tag at end of the page. But it seems that on the script tag, the characters already are not the original input. So "title": "(Testing) UNICODE S\u00c3\u00a3o ... seems to be


<script>
--
  | var config = {"filtersOn": false, "subtext": "(Testing) UNICODE S\u00c3\u00a3o Jo\u00c3\u00a3o, abra\u00c3\u00a7o\npassword: test", "filters": [], "title": "(Testing) UNICODE S\u00c3\u00a3o Jo\u00c3\u00a3o, abra\u00c3\u00a7o", "color": 0, "grid": "1", "table": {"fields": [{"column": 0, "tag": "#status"}, {"column": 1, "tag": "#country"}, {"column": 2, "tag": "#adm1"}, {"column": 3, "tag": "#adm1+code"}, {"column": 4, "tag": "#loc"}, {"column": 5, "tag": "#loc"}, {"column": 6, "tag": "#org"}, {"column": 7, "tag": "#loc+type"}, {"column": 8, "tag": "#affected+dead"}, {"column": 9, "tag": "#affected+confirmed"}, {"column": 10, "tag": "#affected+suspected"}], "data": "https://docs.google.com/spreadsheets/d/1R9zfMTk7SQB8VoEp4XK0xAWtlsQcHgEvYiswZsj9YA4/edit#gid=0"}, "headlinefigures": 0, "charts": [{"chartID": "map0001/#adm1+code/3", "mapOptions": [{"colour": null, "scale": "linear", "display": "", "size": null}], "data": "https://docs.google.com/spreadsheets/d/1R9zfMTk7SQB8VoEp4XK0xAWtlsQcHgEvYiswZsj9YA4/edit#gid=0", "title": null}, {"chartID": "chart0013/#org/6/#affected+confirmed/9", "mapOptions": [], "data": "https://docs.google.com/spreadsheets/d/1R9zfMTk7SQB8VoEp4XK0xAWtlsQcHgEvYiswZsj9YA4/edit#gid=0", "title": null}, {"chartID": "chart0012/#org/6", "mapOptions": [], "data": "https://docs.google.com/spreadsheets/d/1R9zfMTk7SQB8VoEp4XK0xAWtlsQcHgEvYiswZsj9YA4/edit#gid=0", "title": null}], "headlinefigurecharts": []};
  | var gridURL = "/static/grid_data/gridxxx.html";
  | var iframe = false;
  | </script>

Print screen

Captura de tela de 2020-04-11 10-15-52

SimonbJohnson commented 4 years ago

I can confirm this happens when the data is saved. The current method is to convert the config variable to a string which sits in a hidden field on the html form: https://github.com/SimonbJohnson/quickX3/blob/master/hxldash/static/js/dashmaker.js#L487

It is then decoded at:

https://github.com/SimonbJohnson/quickX3/blob/master/hxldash/views.py#L100-L103

I'm not sure if the error is occurring client or server side yet.

I suspect it might be best to rewrite this saving function to actually send a json object rather than a string, but this thread suggests it might still be processed as a string

https://stackoverflow.com/questions/1208067/wheres-my-json-data-in-my-incoming-django-request

fititnt commented 4 years ago

Great!. By having exactly the client side function that encodes and the back end code that decodes and then saves is a great start to later know potential places to change.

But since this deals with encoding, maybe will not be a quick change. Python is not one of my main programming languages, so I may take some time to understand the backend part.

From experience, one special issue with encoding is that many things can affect this, even at operational system level (of the server). Also when using javascript, the way it is used to send data to the backend actually abstract a lot of things, and when we can't found an exactly code that fit our needs, we may have to debug more deep than would be do an refactoring that know to work.

fititnt commented 4 years ago

using encodeURIComponent

// https://github.com/SimonbJohnson/quickX3/blob/master/hxldash/static/js/dashmaker.js#L487
$('#save').on('click',function(){
    config.title = $('#create-title').val();
    config.subtext = $('#create-description').val();
    tableToConfig();
    $('#formconfig').val(encodeURIComponent(JSON.stringify(config)));
    $('#savemodal').modal('show');
});

On https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent there are some comments that instead of using encodeURIComponent directly, it do an additional step to encode some characters.

But most of these characters that the encodeURIComponent already don't do are not typical multi byte UTF-8 like ç so in theory it should work.

console.log(encodeURIComponent('ç')); 
// %C3%A7
decodeURIComponent('%C3%A7')
// "ç"

Maybe one way (without do some refactoring), would be do a quick test on backend and print on the console of the python how the raw POST is received when a text contain a letter like ç. Maybe even before the backend be able to decode the string, ti already is not more %C3%A7

using jQuery + stackoverflow top voted anwser

var myEvent = {id: calEvent.id, start: calEvent.start, end: calEvent.end,
               allDay: calEvent.allDay };
$.ajax({
    url: '/event/save-json/',
    type: 'POST',
    contentType: 'application/json; charset=utf-8',
    data: $.toJSON(myEvent),
    dataType: 'text',
    success: function(result) {
        alert(result.Result);
    }
});

The top answer from https://stackoverflow.com/questions/1208067/wheres-my-json-data-in-my-incoming-django-request may actually be able to deal with this issue (when saving). This may require some refactoring, but if taking too long using the encodeURIComponent may be easier.

Another advantage is that may be more easier to find examples using both jQuery and Django than using directly the encodeURIComponent and deal with corner cases.


I suspect it might be best to rewrite this saving function to actually send a json object rather than a string, but this thread suggests it might still be processed as a string

I agree. Even if we make it work the encodeURIComponent for ç maybe would still be bugs for other more complex encoding (Modern Standard Arabic, Mandarin Chinese). But with the stackoverflow suggestion it is likely to already be very tested on these other cases.

SimonbJohnson commented 4 years ago

I think this has been fixed in this branch: https://github.com/SimonbJohnson/quickX3/tree/fixing-unicode-values

In the end I removed the encoding (client) and decoding (server) functions as they seem redundant. I'm not sure why they were ever included, but slightly wary that I might of broken something else. After much testing though everything seems to be working.

fititnt commented 4 years ago

I can confirm that even the production server hxldash (see https://hxldash.com/view/329) it's already working.

If at some point on future I'm take some time to test with other encodings (like Chinese or Arabic) I can open an new issue.

Thanks! And great work!