Open doyu opened 1 year ago
Just want to clarify if this task is dependent on #93 being completed? If it is not dependent, am I understanding this task correctly that this could be implemented purely with streamlit or do I need to think about storing the data to external storage (you mentioned Google Drive)?
For #93 I think that we go with the current implementation in Sprint 3 so that you could implement this item independently basically. One point to note here may be that, right now, Streamlit doesn't allow to store data locally. You may need to to use external storage for data (e.g. google driver or s3). For demo purpose, you could survive with running Streamlit locally using local storage, but for SaaS, it would run on a Cloud VM without local storage. You need to consider the design for that. ref: https://github.com/Origami-TinyML/tflm_hello_world/issues/93#issuecomment-1437252195
Are you thinking something like this https://docs.streamlit.io/knowledge-base/tutorials/databases/aws-s3 ?
Are you thinking something like this https://docs.streamlit.io/knowledge-base/tutorials/databases/aws-s3 ?
Yes since Streamlit doesn't have any access to the local storage at all, at least, for now.
Ok, I'll try to set up aws s3 storage by following those instructions
I have now set up S3 bucket and manage to upload a csv file from the bucket to streamlit by following those instructions. Next step is to store image data from streamlit to S3. Just for a clarification, do you want me to create a feature that stores only one image?
I was thinking implementing something like this:
@nellatuulikki
it's really good that you asked before implementing without assuming something here.
do you want me to create a feature that stores only one image?
No, data should be included in a compressed archive file. The sequence would be:
You may want to take a look at how external data stored and processed in ML from : https://github.com/fastai/fastai/blob/master/nbs/04_data.external.ipynb
Let me know if there's some inconsistency above. Thanks!
Here is an example what I have been testing this evening. I added a feature in which user could add a label same time as storing the data. If this looks ok, I could maybe tomorrow start to add the labeled images to S3.
Looks so cool! Great work!!
BTW, how did you choose a image? from a path or name?
If this looks ok, I could maybe tomorrow start to add the labeled images to S3.
Yes, please. Please don't hesitate to ask any questions.
I guess that the lable could be the name of parent dir so that the image would be stored in s3/0/349.png , s3/1/348.png ?
@nellatuulikki would it be possible to switch backend storage dynamically, S3 or local one, via some environmental variable? It's better to reduce the external dependency, esp. for CI and demo.
I have now created PR with possibility to store only to S3. But basically you are asking that user could select if the images are stored locally or to S3? And do you mean that the app would create a new directory locally (if it doesn't exist already) and then store files there?
I was also wondering should we allow storing of unlabeled images? That would mean that there is third directory where all unlabeled images would be stored.
For local storage, we need to find some way since Streamlit doesn't have any access to local storage right now. What I meant here was the same API to store image but this API backend should be changed on-the-fly with an enromental variable, S3 or local.
I was also wondering should we allow storing of unlabeled images?
Yes
How should I set up S3 by myself? Any instructions?
Also is there any acceptance test (robot framework) for this feature if it's not so complicated?
What I meant here was the same API to store image but this API backend should be changed on-the-fly with an enromental variable, S3 or local.
Would localstack be suitable for this? I was able to run Nella's code with it.
Would localstack be suitable for this? I was able to run Nella's code with it.
@FexbYk23 @nellatuulikki Sounds what we want. Please evaluate correctly if this is overengineering for our current usage or not.
BTW: do you have any idea how localstack
intercepts AWS API call locally? (e.g. local port mapping?)
One of another trends, https://world.hey.com/dhh/we-stand-to-save-7m-over-five-years-from-our-cloud-exit-53996caa
Please evaluate correctly if this is overengineering for our current usage or not.
BTW: do you have any idea how
localstack
intercepts AWS API call locally? (e.g. local port mapping?)
It doesn't intercept anything. It's a server that you run locally and you connect to it instead of the normal s3 servers.
It replicates the whole s3 system making it convenient for testing your code, but it's also quite heavy since it involves running another server. A more lightweight solution would probably be enough for our use case.
As I mentioned, I already used it to test Nella's code and I made a branch with it set up. I can create a pull request if we want to use it, if not, you can still use it for testing.
@FexbYk23 Would it be possible to switch without ipynb code? Just with MACRO in yml?
@FexbYk23 Would it be possible to switch without ipynb code? Just with MACRO in yml?
What do you mean by "MACRO in yml"?
Environmental var?
Environmental var?
Doesn't seem to be possible because boto3 has no variable for setting endpoint url.
I meant, the core concept of localstack is to switch dynamically between mock and aws with some environmental var? With localstack, you can do?
I meant, the core concept of localstack is to switch dynamically between mock and aws with some environmental var? With localstack, you can do?
I haven't seen anything about localstack being able to relay requests to real aws. Thus switching between mock(localstack) and aws would have to be done by our python application.
Hiroshi agreed to look into this
I think Hiroshi should create a S3 bucket for this project, and we need to change to code fetch and store images from that bucket. I think otherwise this is done.
@nellatuulikki
I think Hiroshi should create a S3 bucket for this project,
Yes, I should! Any pointer to how-to? I'll work on....
I have put a link to the instructions in discord but I paste it here as well.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/GetStartedWithS3.html
Is there any estimate when we could use this new account?
I'll by tomorrow, sorry for the delay.
Can you survive with local storage alternatively?
I think we can manage with local storage for now.
I think part of the cloudification task we need to get S3 running. We have all functions ready for S3 for storing images (not yet for compressed files). AWS is offering Free Tier, which is free for 12 months, but it has constraints regarding requests (20 000 Get requests and 2000 Put, Copy, Post and List requests) and has a size limit for storage space (5GB). If we store/fetch our two datasets by handling images separately I think the request constraints are full pretty quickly.
The compressed files would need less storage space. I think it would also reduce significantly requests done to the bucket, however I don't have proper estimation of that. I think we should test it first in localstack to get some understanding how many request are needed for compressed files. I also understood that you prefer storing images with compressed files.
But before I test it and use time for that, I want to make sure that S3 constraints are understood. Here is the link for pricing if Free trier constraints are passed.
https://aws.amazon.com/s3/pricing/?p=pm&c=s3&z=4
If S3 sounds good, we need the credentials for the used bucket, so we could have this running in cloud for next week's demo.
@nellatuulikki I thought the path to s3 location might be enough so that you guys could have been using already. I will provide credential early next week Anyway, TL would ease the situation.
The parent issue #45 User stories https://miro.com/app/board/uXjVPwQdIjc=/
Acceptance test