NPLinker / nplinker-webapp

Apache License 2.0
2 stars 2 forks source link

create basic skeleton of the dashboard #22

Closed gcroci2 closed 3 weeks ago

gcroci2 commented 1 month ago

See #17

gcroci2 commented 1 month ago

Tried to run it, works well!

It wasn't ready to be reviewed yet 😁

I am almost done now, but there is an issue in uploading the data. After digging quite a bit online, I think that the problem is when I try to upload a big PKL file (> 400 MB). I created a mockup PKL file containing only part of the data of the one created by running the nplinker quickstart (which is almost half GB). The mockup file works, the other one does not. See also this issue raised on dash github page.

CunliangGeng commented 1 month ago

Aha, you could change the PR to a draft if it's not ready for review yet, otherwise I'll keep getting notifications.

image

gcroci2 commented 1 month ago

I think that now that you assigned yourself you'd still get notifications:

image

Usually I assign the reviewer only when the PR is ready, and I don't use the draft mode at all. This way you shouldn't get any notification as well :)

gcroci2 commented 1 month ago

Update about the state of the PR @CunliangGeng

We need to establish a robust method for uploading data into our web application. While the dcc.Upload Dash component is a straightforward choice, it stores the uploaded contents in the web browser's memory before sending them to the server. This works for small files, but fails for files larger than ~150 MB (refer to this thread). Therefore, I've considered several alternatives:

1) Direct file system access without uploading

2) Using the dash-uploader component

3) Uploading the file in chunks

I recommend we focus on making option 2b work. If it proves too complex or time-consuming, we should consider option 1. Despite concerns about its maintenance, dash-uploader remains the most well-implemented component for this task.

CunliangGeng commented 1 month ago

Thanks @gcroci2 for the detailed update.

I took a look at the options carefully. I would suggest we directly use the option 1. To my understanding, the option 2 (a,b) is essentially the re-implementation of the option 1, which will take much effort to get the similar functionalities: supporting large file, chunked uploads, resumable uploads, progress tracking, security... It would make more sense to put these effort into option 1 if it is really needed (e.g. some feature does not meet our demand). What do you think?

Also, I realised we did not compress the input file. I just gave a try, the pickle file can be compress 10x smaller, that is really good! We should upload compressed file to webapp. I will take a look at which compression protocol is best for our case.

justinjjvanderhooft commented 1 month ago

Thanks both for your inputs! If we indeed are able to compress the input files substantially, then option 1 would represent the most logical path it seems. One thing to consider is the validation of the files, when compressed, this may lead to some issues, or are there protocols for this in place as well?

CunliangGeng commented 1 month ago

@justinjjvanderhooft No worry about the validation, we'll use lossless compression method, no data will be lost during compression and reconstruction.

gcroci2 commented 1 month ago

Sorry @CunliangGeng, I wrote 2a and 2b instead of 3a and 3b. I think this generated confusion in the numbering :confounded:

I took a look at the options carefully. I would suggest we directly use the option 1.

I think you're actually referring to the dash-uploader component, right? (option 2)

To my understanding, the option 2 (a,b) is essentially the re-implementation of the option 1, which will take much effort to get the similar functionalities: supporting large file, chunked uploads, resumable uploads, progress tracking, security... It would make more sense to put these effort into option 1 if it is really needed (e.g. some feature does not meet our demand). What do you think?

If with option 1 you're referring to dash-uploader (actually option 2), and with option 2 you are referring to uploading files in chunks (actually option 3) yes, I totally agree. The reason for implementing our version of it here would be that we need much less functionalities than the ones implemented in the dash-uploader package, and then we wouldn't rely on an external dependency that is not stably maintained anymore. But as I said, this remains the most well-implemented component for this task, so I am in favour of using it directly if you also think is a good idea.

Also, I realised we did not compress the input file. I just gave a try, the pickle file can be compress 10x smaller, that is really good! We should upload compressed file to webapp. I will take a look at which compression protocol is best for our case.

If we go for dash-uploader choice, this may be not needed. Indeed I think that it doesn't change much for the webapp implementation, unless we can be 100% sure that any data generated and compressed will be smaller than 150 MB, and then we go for dcc.Upload component. But if that's not the case, I am not sure if compressing the file gives any advantages.

justinjjvanderhooft commented 1 month ago

I am lost now....

CunliangGeng commented 1 month ago

You're right, what I mean is option 3 is re-implementing option 2. I suggest using option 2 directly, which also support chunk uploads.

If we go for dash-uploader choice, this may be not needed. Indeed I think that it doesn't change much for the webapp implementation, unless we can be 100% sure that any data generated and compressed will be smaller than 150 MB, and then we go for dcc.Upload component. But if that's not the case, I am not sure if compressing the file gives any advantages.

10x smaller file size would speed up the uploading process, I guess. Furthermore, it saves storage and is faster to transmit online. We could make compression optional and the uploading should able to detect the filetype automatically (such as .gz). At the moment, you could focus on the raw pickle file.

sonarcloud[bot] commented 3 weeks ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud