WebApp stores image data

doyu commented 1 year ago

The parent issue #45 User stories https://miro.com/app/board/uXjVPwQdIjc=/

Acceptance test

WebApp receives image data via TCP.
WebApp stores it with label (e.g. parent dir name as lable?).
Verify if data & label are correct.

doyu commented 1 year ago

nellatuulikki commented 1 year ago

Just want to clarify if this task is dependent on #93 being completed? If it is not dependent, am I understanding this task correctly that this could be implemented purely with streamlit or do I need to think about storing the data to external storage (you mentioned Google Drive)?

doyu commented 1 year ago

For #93 I think that we go with the current implementation in Sprint 3 so that you could implement this item independently basically. One point to note here may be that, right now, Streamlit doesn't allow to store data locally. You may need to to use external storage for data (e.g. google driver or s3). For demo purpose, you could survive with running Streamlit locally using local storage, but for SaaS, it would run on a Cloud VM without local storage. You need to consider the design for that. ref: https://github.com/Origami-TinyML/tflm_hello_world/issues/93#issuecomment-1437252195

nellatuulikki commented 1 year ago

Are you thinking something like this https://docs.streamlit.io/knowledge-base/tutorials/databases/aws-s3 ?

doyu commented 1 year ago

Are you thinking something like this https://docs.streamlit.io/knowledge-base/tutorials/databases/aws-s3 ?

Yes since Streamlit doesn't have any access to the local storage at all, at least, for now.

nellatuulikki commented 1 year ago

Ok, I'll try to set up aws s3 storage by following those instructions

nellatuulikki commented 1 year ago

I have now set up S3 bucket and manage to upload a csv file from the bucket to streamlit by following those instructions. Next step is to store image data from streamlit to S3. Just for a clarification, do you want me to create a feature that stores only one image?

I was thinking implementing something like this:

User uploads an image in Data page
When the image has been uploaded successfully, user presses a button "Store image data"
WebApp informs if storing of the image has been successfull

doyu commented 1 year ago

@nellatuulikki

it's really good that you asked before implementing without assuming something here.

do you want me to create a feature that stores only one image?

No, data should be included in a compressed archive file. The sequence would be:

The original data archive is extracted in S3 storage. If not extract, just leave directory structure as it is in S3.
User uploads image(s?) in Data page.
When the image has been uploaded successfully, user can press button "Store image data" (with lable?)
A new image data should be stored in the correct path, assuming the parent path is 'lablel', for example.
Compress dir to an archive if possible.
WebApp informs if the storing has been successfull

You may want to take a look at how external data stored and processed in ML from : https://github.com/fastai/fastai/blob/master/nbs/04_data.external.ipynb

Let me know if there's some inconsistency above. Thanks!

nellatuulikki commented 1 year ago

Here is an example what I have been testing this evening. I added a feature in which user could add a label same time as storing the data. If this looks ok, I could maybe tomorrow start to add the labeled images to S3.

doyu commented 1 year ago

Looks so cool! Great work!!

BTW, how did you choose a image? from a path or name?

If this looks ok, I could maybe tomorrow start to add the labeled images to S3.

Yes, please. Please don't hesitate to ask any questions.

I guess that the lable could be the name of parent dir so that the image would be stored in s3/0/349.png , s3/1/348.png ?

doyu commented 1 year ago

@nellatuulikki would it be possible to switch backend storage dynamically, S3 or local one, via some environmental variable? It's better to reduce the external dependency, esp. for CI and demo.

nellatuulikki commented 1 year ago

I have now created PR with possibility to store only to S3. But basically you are asking that user could select if the images are stored locally or to S3? And do you mean that the app would create a new directory locally (if it doesn't exist already) and then store files there?

I was also wondering should we allow storing of unlabeled images? That would mean that there is third directory where all unlabeled images would be stored.

doyu commented 1 year ago

For local storage, we need to find some way since Streamlit doesn't have any access to local storage right now. What I meant here was the same API to store image but this API backend should be changed on-the-fly with an enromental variable, S3 or local.

I was also wondering should we allow storing of unlabeled images?

Yes

How should I set up S3 by myself? Any instructions?

Also is there any acceptance test (robot framework) for this feature if it's not so complicated?

FexbYk23 commented 1 year ago

What I meant here was the same API to store image but this API backend should be changed on-the-fly with an enromental variable, S3 or local.

Would localstack be suitable for this? I was able to run Nella's code with it.

doyu commented 1 year ago

Would localstack be suitable for this? I was able to run Nella's code with it.

@FexbYk23 @nellatuulikki Sounds what we want. Please evaluate correctly if this is overengineering for our current usage or not.

BTW: do you have any idea how localstack intercepts AWS API call locally? (e.g. local port mapping?)

doyu commented 1 year ago

One of another trends, https://world.hey.com/dhh/we-stand-to-save-7m-over-five-years-from-our-cloud-exit-53996caa

FexbYk23 commented 1 year ago

Please evaluate correctly if this is overengineering for our current usage or not.

BTW: do you have any idea how localstack intercepts AWS API call locally? (e.g. local port mapping?)

It doesn't intercept anything. It's a server that you run locally and you connect to it instead of the normal s3 servers.

It replicates the whole s3 system making it convenient for testing your code, but it's also quite heavy since it involves running another server. A more lightweight solution would probably be enough for our use case.

As I mentioned, I already used it to test Nella's code and I made a branch with it set up. I can create a pull request if we want to use it, if not, you can still use it for testing.

doyu commented 1 year ago

@FexbYk23 Would it be possible to switch without ipynb code? Just with MACRO in yml?

FexbYk23 commented 1 year ago

@FexbYk23 Would it be possible to switch without ipynb code? Just with MACRO in yml?

What do you mean by "MACRO in yml"?

doyu commented 1 year ago

Environmental var?

FexbYk23 commented 1 year ago

Environmental var?

Doesn't seem to be possible because boto3 has no variable for setting endpoint url.

doyu commented 1 year ago

I meant, the core concept of localstack is to switch dynamically between mock and aws with some environmental var? With localstack, you can do?

FexbYk23 commented 1 year ago

I meant, the core concept of localstack is to switch dynamically between mock and aws with some environmental var? With localstack, you can do?

I haven't seen anything about localstack being able to relay requests to real aws. Thus switching between mock(localstack) and aws would have to be done by our python application.

lifeofborna commented 1 year ago

Hiroshi agreed to look into this

nellatuulikki commented 1 year ago

I think Hiroshi should create a S3 bucket for this project, and we need to change to code fetch and store images from that bucket. I think otherwise this is done.

doyu commented 1 year ago

@nellatuulikki

I think Hiroshi should create a S3 bucket for this project,

Yes, I should! Any pointer to how-to? I'll work on....

nellatuulikki commented 1 year ago

I have put a link to the instructions in discord but I paste it here as well.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/GetStartedWithS3.html

Is there any estimate when we could use this new account?

doyu commented 1 year ago

I'll by tomorrow, sorry for the delay.

Can you survive with local storage alternatively?

nellatuulikki commented 1 year ago

I think we can manage with local storage for now.

nellatuulikki commented 1 year ago

I think part of the cloudification task we need to get S3 running. We have all functions ready for S3 for storing images (not yet for compressed files). AWS is offering Free Tier, which is free for 12 months, but it has constraints regarding requests (20 000 Get requests and 2000 Put, Copy, Post and List requests) and has a size limit for storage space (5GB). If we store/fetch our two datasets by handling images separately I think the request constraints are full pretty quickly.

The compressed files would need less storage space. I think it would also reduce significantly requests done to the bucket, however I don't have proper estimation of that. I think we should test it first in localstack to get some understanding how many request are needed for compressed files. I also understood that you prefer storing images with compressed files.

But before I test it and use time for that, I want to make sure that S3 constraints are understood. Here is the link for pricing if Free trier constraints are passed.

https://aws.amazon.com/s3/pricing/?p=pm&c=s3&z=4

If S3 sounds good, we need the credentials for the used bucket, so we could have this running in cloud for next week's demo.

doyu commented 1 year ago

@nellatuulikki I thought the path to s3 location might be enough so that you guys could have been using already. I will provide credential early next week Anyway, TL would ease the situation.

Origami-Cloudless-AI / TinyMLaaS-2023-winter

WebApp stores image data #50

Acceptance test