WheatonCS / Lexos

Python/Flask-based website for text analysis workflow. Previous (stable) release is live at:
http://lexos.wheatoncollege.edu
MIT License
118 stars 20 forks source link

Re-assess how data is sent to the back end #676

Open scottkleinman opened 6 years ago

scottkleinman commented 6 years ago

There are several related issues here:

  1. Some Ajax functions use the HTML FormData API to serialise the data sent to the back end. In Flask, this has to be accessed using request.form. Elsewhere, we send data in the form of a JSON string, which is accessed with request.json. We should have a single format to standardise the receiver methods.
  2. A simplified receiver should annotate the type of self._front_end_data as Dict[str, str]. That means that everything in the dict extracted from the JSON string must be of type string. We need to guarantee that that is the case and remember to cast the values to integers, floats, Booleans, etc. before we use them.
  3. It is not clear what happens to Javascript array data, which deserialises as a Python list. Of the standard form input types, checkboxes are (I think) the only type that submits an array. The other scenario is if we try to send an arbitrary array to the back end via an Ajax request. How do we handle this scenario? On the front end? On the back end? (A very cursory search of Stack Overflow suggested the latter.)

Relevant reading from the Front End Developer's Guide:

czhang03 commented 6 years ago

one of the way to send more robust json is to cast json into NamedTuple

In this comment I want to convince people that I cannot think of a good implementation to handle robust json in the backend

example untested code for robust json:

from typing import NamedTuple, List

# ===== general_function.py ========

# this means any type that is subclass of NamedTuple
# TypeVar sometimes are called one of the following in other languages:
# - generic (in most imperative languages, including C#, Java, TypeScript)
# - template (in C++) 
# - type variable (in Haskell and Haskell like languages)
T = TypeVar('T', NamedTuple)  

# this function unpack the content in dictionary into a Named Tuple
def decode_dict_into_namedtuple(dict: Dict[str, Any], nametuple: Callable[..., T]) -> T:
    return nametuple(**dict)

# ========== receives ================

# example json: {"id": 12345, "name": "test", "score_list": [12, 13, 14]}
def RawFrontEndOption(NamedTuple):
    id: int
    name: str
    score_list: List[int]

raw_front_end_option =  decode_dict_into_namedtuple(request.json, RawFrontEndOption)

Here is the drawback of this approach:

1. Sending a robust json requires more work on the front-end than simply Serialising Form Data

2. This method does not handle nested NameTuple:

# example json: {"personal_info": {"name": "test", "id": 12}, "score_info": {"first": 99, "second": 100}}
def PersonalInfo (NamedTuple):
    name: str
    id: int

def ScoreInfo(NamedTuple):
    first: int
    second: int

def All(NamedTuple):
    personal_info: PersonalInfo
    score_info: ScoreInfo

test = decode_dict_into_namedtuple(request.json, All)

The test variable will be:

personal_info:  {"name": "test", "id": 12}
score_info: {"first": 99, "second": 100}

notice that both personal_info and score_info are decoded as dict, not namedtuple. this is caused by the fact that json cannot distinguish between dict and object.

The only way is to use typed_json approach, we can do something like this: https://www.npmjs.com/package/typed-json

But this will make the frontend programmer have a even harder time

3. This approach does not help to provide a better typed connection between backend and frontend:

This is because NamedTuple type checks only on compile-time, not runtime.

# request.json: {"test": 123}

class Test1(NamedTupe):
    test: str

test = decode_dict_into_namedtuple(request.json, Test1)

This code can proceed without any error on both runtime and compile-time.

Because at compile time the type checker do not know the type of request.json, therefore it will not raise an error; at runtime, the NamedTuple do not do type check, therefore it putting a int into a field that is marked as string (the field test) will not raise an error.

scottkleinman commented 6 years ago

I think this is a back end problem which should be solved on the back end. Would pydantic offer a possible solution?

czhang03 commented 6 years ago

@scottkleinman I don't think this is a pure backend problem. I believe that this is more front-end than backend.

Because most of our developer don't have any frontend experience, it would be hard for them to actually use jquery and javascript. Two years ago, I just add a field in the form and expect it to send the result to the back end.

If you want to send a general json, that would involve the developer to use jQuery to get the result from the input field and put that into the data is send to the backend. That is the major issue, I think.

czhang03 commented 6 years ago

By the way, pydantic is great

scottkleinman commented 6 years ago

Anyone working on the front end will now have to handle submissions with Javascript/jQuery and probably shouldn't be doing too much development if they don't have a basic understanding. That said, the Front End Developer's Guide provides specific code for serialising form data, which just needs to be copied. Both FormData and JSON coerce everything to strings, so the problem seems to me to be parsing the strings and validating the type of data on the back end. FormData seems is automatically converted by Flask to a MultiDict, which apparently has a type function. That might be of some help.

But pydantic has recursive structures, which, from what I can tell from GitHub issues, is not handled by Python typing. So, if pydantic can disambiguate a JSON object, it offers the best solution, since it is not realistic to ask the front end Javascript to support the same typing and because existing forms of serialisation are so straightforward for the front-end developer.

That said, I haven't used typing, except to get a sense of how it works, so there may still be something I am not seeing.

czhang03 commented 6 years ago

@scottkleinman I am all for the current approach which send back a json maps string to string (Dict[str, str]). I am just saying sending an arbitrary json (like {"id": 2, "name": "test"}) from the frontend to backend, although tempting, can possibly be a bad idea.

scottkleinman commented 6 years ago

OK, point taken. I need to look a little more closely at the string coercing performed by FormData and serializeArray() to make sure we know exactly what comes through on the back end. And, in the latter case, we need to know the resulting JSON string is decoded by json.load. I'll look into this later tonight.

scottkleinman commented 6 years ago

Here's a front end method of guaranteeing that all values are strings. Build into the serialisation a function like this:

function castValuesAsStrings(object) {
    $.each(object, function(index, value) {
        if (typeof value != 'string') {
            object[index] = String(value);
        } 
    });
    return object;   
}

This version doesn't handle arrays (lists) or objects (dicts), but it could be extended to do so. The back-end coder, of course, still has to cast the values back to a useable type. Is it worth doing this?