fohrloop / dash-uploader

The alternative upload component for python Dash applications.
MIT License
144 stars 30 forks source link

Support multiple Dash or Flask apps for "du.configure_upload". #39

Open cainmagi opened 3 years ago

cainmagi commented 3 years ago

Introduction

This PR preserves all existed APIs. In other words, the modifications are compatible with previous APIs.

This PR is designed for providing the following new API:

du.configure_upload(app, UPLOAD_FOLDER_ROOT, upload_api='my-awesome-API/upload')
du.configure_upload(app, UPLOAD_FOLDER_ROOT, upload_api='my-awesome-API/upload-second', upload_component_ids='second')

where we have:

  1. du.configure_upload could be used several times.
  2. The argument app could be a dash.Dash instance or a flask.Flask instance. The argument type would be checked during the configuration.
  3. A new argument upload_component_ids is provided, it could be
    • None: This is the default value. It means that this du.configure_upload is designed for default configurations.
    • str: This is the same as (str, )
    • tuple or list: A sequence of str. Each str is a component id. The component with the provided id would be configured by the provided configurations.
  4. A simple check has been implemented for upload_component_ids. If it is not None, each item requires to be a non-empty str. And each item should be unique, i.e. repeating configurations for the same component is not allowed.

Example

For dash

import dash
import dash_html_components as html
from dash.dependencies import Output
import dash_uploader as du

app = dash.Dash(
    __name__,
    assets_folder='./assets',
    assets_url_path='/assets',
    assets_ignore=r'.*.js'
)
app.title = 'A testing app.'
du.configure_upload(app, folder='upload', use_upload_id=False)
du.configure_upload(app, folder='upload', use_upload_id=False, upload_api='/api-uploader', upload_component_ids='uploader')

app.layout = html.Div(
    children=[
        du.Upload(
            id='uploader',
            text='Drag & Drop or Select a File',
            filetypes=['csv', 'zip'],
            upload_id='.',
            max_files=1,
            default_style={
                'width': '100%',
                'paddingLeft': '.5rem',
                'paddingRight': '.5rem',
                'minHeight': 'min-content',
                'lineHeight': '50px',
                'borderWidth': '1px',
                'borderStyle': 'dashed',
                'borderRadius': '5px',
                'textAlign': 'center'
            }
        ),
        html.Div(id='trigger-upload')
    ]
)

@du.callback(
    output=Output('trigger-upload', 'children'),
    id='uploader',
)
def trigger_upload(filenames):
    return html.Ul([html.Li(filenames)])

@app.server.after_request
def add_header(response):  # pylint: disable=unused-variable
    response.headers["Cache-Control"] = "no-cache, no-store, must-revalidate"
    response.headers["Pragma"] = "no-cache"
    response.headers["Expires"] = "0"
    response.headers['Cache-Control'] = 'public, max-age=0'
    response.cache_control.no_cache = True
    return response

if __name__ == '__main__':
    import colorama

    colorama.init()
    app.run_server(host='localhost', port=8060, debug=False)

Users could change the upload_component_ids in the second du.configure_upload and check what happens.

For Flask

import os
import flask

import dash_uploader as du

app = flask.Flask('test-uploader')

os.makedirs('upload', exist_ok=True)
du.configure_upload(app, folder='upload', upload_api='/dash-uploader', use_upload_id=False)
du.configure_upload(app, folder='upload', use_upload_id=False, upload_api='/uploader2', upload_component_ids='uploader2')

if __name__ == '__main__':
    import colorama

    colorama.init()
    app.run(host='localhost', port=8061, debug=False)

In this case, actually, we do not need to configure upload_component_ids, because a Flask app does not have components. It should be only used for managing APIs.

For the next step

  1. This PR is required for implementing the next feature (cross-domain) supports. I would not create the next PR before this PR is merged or closed.
  2. If this PR is acceptable, I could start working on the documentation before merging it to the master branch.
  3. We need to consider the following two topics:
    1. Currently, I add a check for the du.callback. If the component is not configured for a dash app (not configured or configured for a flask app), the du.callback would be forbidden. However, it is possible to implement a "Flask callback". We could invoke the callback by using the request handler after the final chunk uploaded. I am wondering whether we need to implement this feature (maybe this feature should be split into another PR).
    2. If we decide to implement the "Flask callback", there is an important problem. Because Flask app does not maintain the component ids, we may have two options for the implementation: (1) Add the component id in the arguments of the request. Then the upload_component_ids should be preserved because the Flask app could identify the component id. (2) Each "Flask callback" is designed for an API. In this case, we only need to add a check for du.configure_upload and forbid users to set upload_component_ids when app is a Flask app.

Update report

  1. Change the format of du.settings.
  2. Move update_upload_api from du.upload to du.configure_upload.
  3. Add an option upload_component_ids to du.configure_upload. This enables users to set different configurations for different components.
  4. Support Flask apps for du.configure_upload.
fohrloop commented 3 years ago

Thanks for the detailed PR! I'm happy to see this progressing. One thing that I'm considered that there might be some side effects that I cannot foresee, and the complexity of the things this package can do is increasing. This can lead into difficulties in maintaining the dash-uploader in the future. What I would suggest, is that there will be some automatic tests added to all the new features. This way we can see that nothing will break when something is changed.

cainmagi commented 3 years ago

I have to admit that this PR is large even though the API changes are very small. At least, I could confirm that the two examples I provide (as well as the usage.py in the root folder) could work well.

For the automatic tests stuff, should it be related to the other issue #34? I am guessing that you want to use pytest, but currently, there are no available testing codes in ./test.

I know that pytest could be integrated with Github Actions, and I have tried to do that before. I know how to write testing codes for dash (although I am not an expert on that), but I find that it is difficult to set up Chrome drivers for Github Actions. I guess that a testing container may be required before setting up Github Actions. I am still learning how to do it now. If you do not mind my poor code style, and only need to perform tests locally, I could try to write some testing codes and start another PR.

fohrloop commented 3 years ago

I would be delighted to see a PR for some basic testing! :) No need to add GitHub Actions etc. at this stage, just bare minimum to get testing started. This is exactly what issue #34 is about.

One thing what I'm now thinking that the du.configure_upload does a bit different thing on Flask and on Dash apps, right? I know that you had the flask part in a separate function in a previous PR, but I'm now thinking (just initially) that if it would make more sense for the user that it would be different function (if it does a different thing). Something like du.configure_flask. The du.configure_upload could call it internally to prevent duplicate code. But this is just initial thinking. I really should take time to look through the additions in this PR and try to understand the implications from all the possible angles.

One thing I'm thinking is that what is the motivation behind supporting multiple calls to du.configure_upload, if there is still no full support for multiple du.Upload components on same app? (referring to https://github.com/np-8/dash-uploader/issues/35) Or is there some chicken-and-egg -problem here?

cainmagi commented 3 years ago

Please do not try to merge it now. We may need to change the API according to the previous discussion.

If we discard the implementation based on multiple calls to du.configure_upload, we would need to add an argument like service_addr or upload_api to du.Upload, and make the upload_api configured in du.configure_upload become a default value. We need to do that because we need to support cross-domain uploading based on this PR in the next step.

If you think the implementation of multiple calls to du.configure_upload is preferred. I think we may do not need to split du.configure_flask from it. Because in the next step, we may need to provide a du.configure_remote_upload like this

du.configure_remote_upload(app, remote_addr, upload_component_ids)

because an uploader directed to the remote server only needs to handle the callback. The uploading folder, API, and requests are handled by the remote side.

I have realized that in my case, I do not need to use du.configure_upload or something like du.configure_flask multiple times, because it is not necessary to configure multiple different APIs for the same app, and it is not possible to serve multiple different apps. But I need to do something like:

du.configure_upload(app, ...)
du.configure_remote_upload(app, remote_addr_1, upload_component_ids=...)
du.configure_remote_upload(app, remote_addr_2, upload_component_ids=...)
...

For the remote side, I only need to configure something like this

du.configure_upload(flask_app, ...)

So serving multiple calls of du.configure_upload may not be a good choice. But we need to serve multiple calls to something like du.configure_remote_upload.

How do you think of that? If the multiple calls to du.configure_upload is not preferred, I could start another PR and close this one. Before that, we may also need to decide whether it is necessary to split du.configure_flask.

fohrloop commented 3 years ago

This clearly needs some more thinking with better good night's sleep. Here are some fast notes. These are not necessarily answering to anything, but they can help us later when designing the future API & functionality of dash-uploader.

Dash

The Dash instance will have

There values are read from the dash app instance and saved to du.settings, when du.configure_upload is called.

du.configure_upload

saves some settings

opens routes

du.Upload

HttpRequestHandler

cainmagi commented 3 years ago

Thank you for replying to me. Currently, I tend to drop the implementation of this PR, because I think we may not need the multiple calls for du.configure_upload. I may continue to work on this project in the next week. Before that, I want to know how you think of du.configure_upload. Even we drop the multiple-call-based implementation, we could still make the du.configure_upload supports the Flask app.

Authentication

First, let me explain my view about authentication. Currently, I have to acknowledge that I have not considered this problem. To my knowledge, the authentication could be maintained in a very complicated (but also very safe) way or a very simple way.

But an important problem is that the authentication is an optional feature for dash. We could not expect that each dash app is equipped with authentication, and we may not know how do the users apply the authentication. For the Flask case, we may have a similar situation. If we want to make the authentication a built-in feature of dash-uploader for multi-services, we may need to add the following features for the users:

Because I am not a professional web developer, I am not sure whether these steps are widely acceptable. I know that there are some plugins like flask-login or flask-jwt that may make it easier to develop the authentication feature if you do not mind adding them into the dependencies. But in my opinion, providing the authentication feature maybe not necessary, because users could implement the authentication by themselves and use your HttpRequestHandler to add customized codes for the authentication.

For this PR

Before the next step, I think we may need to determine the one thing:

For the remote service case, we may also need to determine the functionalities. But I think we could discuss it in the next PR about multi-services.

fohrloop commented 3 years ago

I was thinking that I would have had more time today to think this through, but unfortunately that did not happen. I wish I had few hours just to think on this, so I could give my opinion. I appreciate your time and contribution.

Authentication

For the authentication case, dash-uploader was first released without any possibility (without changing the source code) of having any kind of authentication. Now there is HttpRequestHandler that I made partly for the support for authentication. I have not much experience on the topic myself either, and as you said, most dash apps are without authentication. I would recommend anyone using dash-uploader or similar to use some sort of authentication and validation for the inputs, and perhaps some upload rate limits per user. I have a feeling that adding authentication to dash-uploader might be a bit complicated, as there might be very different authentication schemes people use. So for now, I think it can be left as the responsibility of the user to subclass the HttpRequestHandler and add their own logic. In the future, perhaps there could be some examples or helper functions, though.

Of course, the dash-uploader should be designed in such a way that authentication is possible (and relatively easy). So the topic is good to be kept in mind.

About this PR

Yeah that is a valid point, do we need multiple calls to du.configure_upload. Actually, what I was thinking what would be really helpful as the next step, is to make a list of requirements / wishes on what we really wish to accomplish, and how the user of dash-uploader would use the features. In my opinion, having multiple dash-uploader components on same web app would be "nice to have", but I think I remember only one person asking about if it could be made possible. The React component should be only ever sending AJAX calls to one address, that is for sure. But, should it be possible to send them directly cross-domain (to a flask server?). Or, would it be better to make the Javascript always talk with the "main dash app", so that authentication is easy, and then "pipe" the calls forward to somewhere else, if needed?

Here is a simple diagram which explains what I mean: image

Some random thoughts

cainmagi commented 3 years ago

Hello! I am back for this project now. First, please let me explain my understanding of your thoughts:

fohrloop commented 3 years ago

Hi @cainmagi ,

If I understood this correctly, this

As I have proposed before, the locally fired callback function needs to send a request to the remote service, and use the response of the service to provide the output value of the local callback. To implement this workflow. We need to write codes for sending requests (by urllib3 or requests) from the dashboard side, and codes for handling the requests (by Flask) from the remote side

..says that you were also about to pipe all the AJAX calls (from browser) through the Dash app to your Flask app (and not sending from browser directly to Flask)?

Thank you for the question! I would really much like to simplify the user experience by using du.Upload independently from du.configure_upload as you suggested. However, there are some restrictions that lead to the current dash-uploader API

fohrloop commented 3 years ago

What I am basically saying is that one needs to have one call to du.configure_upload (to open a route in the Flask app and define output folder) for each du.Upload component.

cainmagi commented 3 years ago

About request forwarding

Not really. Because

  1. We do not need to forward dash-uploader requests to the remote Flask service. Because with the cross-domain modifications, these requests have been already sent from the browser to the remote service directly. The dash app would not receive those requests in this case.
  2. Although the dash-uploader requests have been directed to the remote service, the /_dash-update-component request would still be sent to the local dash app, i.e. the callback would be still fired locally. However, we do not need to forward the /_dash-update-component either, because the remote service is a Flask app, it is not worth writing codes to forward the dash formatted requests and handle them.
  3. Instead, we send a different request in the fired callback function. This request directs to a totally customized API on the remote side. The request is not sent by the browser but sent by the callback function (using something like urllib3 or requests). Because this request is totally customized, it is easy to control what we want to do on the remote side. In this way, the request would serve as "a flask callback". Because it is always sent by the callback function, if the callback is not fired, the request would not be sent.

I think I should draw a figure in the next PR (about the cross-domain implementation) to make this workflow clearer.

About configure_upload

Thank you for your explanation. I find that I misunderstand your post again :sweat_smile:. In the previous post, I suggest moving the folder and API in du.Upload because I think the du.configure_upload needs to be called multiple times. But now I find that it is not necessary to do that, Because

  1. If there are no cross-domain services, there is no need to provide multiple APIs. One API is enough for each app (either dash app or flask app).
  2. If there are cross-domain services, compared to the following style:
    du.configure_remote_upload(app, remote_addr_1, upload_component_ids=['id1', ...])
    du.Upload(id='id1', ...)

    I prefer the following style

    du.Upload(id='id1', service=remote_addr_1, ...)

    Then we do not need to provide multiple calls to du.configure_upload. Because each app only requires one configuration. For example, the remote services should be configured from the remote side, while the dash app only needs one configuration.

  3. But we could still modify the du.configure_upload and let it support the Flask app. This would be a prerequisite step for the cross-domain implementation.

If you think the above statements are OK, I could close this PR and start another PR for supporting the Flask app first. Thank you!

fohrloop commented 3 years ago

Thanks for the explanations! A diagram would surely help to understand the working principle much faster!

If we would use du.configure_upload for Flask apps the upload_api parameter documentation does not make sense as the requests_pathname_prefix is specific to Dash, I think. Not sure if it should be a separate du.configure_flask function or not. But that can be also decided after some initial implementation when we see how it goes.

If there are cross-domain services, compared to the following style: du.configure_remote_upload(app, remote_addr_1, upload_component_ids=['id1', ...]) du.Upload(id='id1', ...) I prefer the following style du.Upload(id='id1', service=remote_addr_1, ...)

Do you think it would work just just du.Upload(id='id1', service=remote_addr_1, ...)? I mean, without any du.configure_remote_upload calls in the dash app? If so, that would be awesome! There probably needs to be at least du.configure_upload call in the app, even if there was only one Upload component that would send directly to a remote Flask app, because the callback functionality needs it. I mean, if you want to use du.callback. This should be something like

configure_upload(app, folder=None, use_upload_id=False, upload_api=None, http_request_handler=None)

But what if you have two du.Upload components, and you want another one to be configured to send to the main Dash app and another component to send directly to the remote Flask app?

cainmagi commented 3 years ago

About Flask

Yes, there some dash-special configurations like requests_pathname_prefix. However, these configurations are only coded in the function, and not exposed as arguments. My concern is, the dash app and the Flask app share the same API:

# If this is a dash app
du.configure_upload(app, folder=None, upload_api=None, use_upload_id=None, http_request_handler=None)
# If this is a flask app
du.configure_upload(app, folder=None, upload_api=None, use_upload_id=None, http_request_handler=None)

All of the arguments would be used for configuring the app in both cases. To reduce the repeats of the codes, we could add a check in the implementation of du.configure_upload and disable some dash features if the app is a Flask app.

About cross-domain service

Yes. I am quite sure that du.Upload(id='id1', service=remote_addr_1, ...) works, because I have already implemented it in my project before. The reason is that such a component does not need either a local upload folder or a local upload API. In this case, there may be a problem in du.callback. See the following line: https://github.com/np-8/dash-uploader/blob/f76252936ffc456a00c3930c569a458af44acd40/dash_uploader/callbacks.py#L36

To make du.callback correct, we may need to add a check. My idea is to add a hard-coded state. If the property service is cross-domain, du.callback would not append the settings.UPLOAD_FOLDER_ROOT to the file name. Currently, I remove it in the callback function body.

Let me use your example to make a simple explanation:

The only problem is that, in some cases, we do not need the first component, i.e. we may only have remote uploading components. Then we actually do not need to configure a local upload API. But I think we should still force users to call du.configure_upload. Because it would not cause any conflict, and it is necessary to register the dash app for du.callback.