SysCV / nutsh

A Platform for Visual Learning from Human Feedback
https://nutsh.ai
Apache License 2.0
79 stars 6 forks source link

How can I create a multi-worker environment? #11

Open prnr opened 10 months ago

prnr commented 10 months ago

I understand that using a single video (image set) for a single project is a problem. So I'm working on creating a work with different addresses by uploading each of the same image sets to different s3 buckets, but an error is occurring when two or more people work at the same time. Is there a limit to traffic with free tier S3? Is it a problem because there are too many requests?

KakaoTalk_20231116_162029396

hxhxhx88 commented 10 months ago

Hi @prnr . At the present multi-worker is not supported. Each video can be worked on by only one annotator at a time. On the other hand, at least different annotators can work on different videos simultaneously. So, a temporary work around can be to split your data into different image sets, and let different people work on them, but each one occupies a single video at a time.

prnr commented 10 months ago

Is this how you organize what you said? The tt-spare comes in different s3 buckets. image

https://prnr-task.p-e.kr/app/_/project/4 It's a project I'm working on, and each image is in an a/b/c/d bucket.

Even if you split it like this and work in different spaces, you're getting the same error message (I split my bucket about 36 hours ago, and it's been happening constantly since right after I split it). Is there any error log that I can provide?

Could it be a problem caused by the performance limitations of AWS free-tier?

hxhxhx88 commented 10 months ago

@prnr I don't think it is due to the limitation of free-tier. Presumably your network speed is fine.

Do you have more than one annotators? If so, let's call two annotators X and Y. Which of the following cases is true?

The latter case will cause the error, since currently it is not supported to have more than one annotators working on the same video(image set). The former case should be fine, and if it is your case, something must be wrong and I'll look into it.

And by the way, where do you deploy your nutsh server? On an EC2 machine?

prnr commented 10 months ago

@prnr I don't think it is due to the limitation of free-tier. Presumably your network speed is fine.

Do you have more than one annotators? If so, let's call two annotators X and Y. Which of the following cases is true?

  • X works on tt-spare2 and Y works on tt-spare3. They never work on the same image set at the same time.
  • X and Y both work on tt-spare2 at the same time.

The latter case will cause the error, since currently it is not supported to have more than one annotators working on the same video(image set). The former case should be fine, and if it is your case, something must be wrong and I'll look into it.

And by the way, where do you deploy your nutsh server? On an EC2 machine?

image

  1. Both cases are experiencing errors (assuming they are recognized as different videos when set as shown in the picture). There are currently 3 workers, and they are working on tt2/tt-spare2/tt-spare3.
  2. I've deployed to EC2 Seoul Region.
hxhxhx88 commented 10 months ago

@prnr I don't think it is due to the limitation of free-tier. Presumably your network speed is fine. Do you have more than one annotators? If so, let's call two annotators X and Y. Which of the following cases is true?

  • X works on tt-spare2 and Y works on tt-spare3. They never work on the same image set at the same time.
  • X and Y both work on tt-spare2 at the same time.

The latter case will cause the error, since currently it is not supported to have more than one annotators working on the same video(image set). The former case should be fine, and if it is your case, something must be wrong and I'll look into it. And by the way, where do you deploy your nutsh server? On an EC2 machine?

image

  1. Both cases are experiencing errors (assuming they are recognized as different videos when set as shown in the picture). There are currently 3 workers, and they are working on tt2/tt-spare2/tt-spare3.
  2. I've deployed to EC2 Seoul Region.

In this case, first I want to suggest that make sure each worker is working on one video solely. Do not let two people working on a same video.

Secondly, there is another potential reason for this error to show, namely your editing speed is faster than your server's speed to save data. The underneath reason is:

Therefore, if you make some editing, and BEFORE the modification is successfully persisted to the database you make another editing, the "sync failed" error will show.

You can do the followings to check which is the case:

So, could you follow the above instruction, and let me know which case it is? It would be better if you can record a video with your nutsh and DevTools side by side while you edit something to trigger the error. Also, can you tell me your EC2 instance type?

Besides, I noticed https://prnr-task.p-e.kr/app/_/project/4 is publicly accessible, which means anyone can edit your dataset. You may add a layer of HTTP Basic Authentication (e.g. using Nginx) to protect your deployment.

prnr commented 10 months ago

In this case, first I want to suggest that make sure each worker is working on one video solely. Do not let two people working on a same video.

Secondly, there is another potential reason for this error to show, namely your editing speed is faster than your server's speed to save data. The underneath reason is:

  • nutsh will automatically save your modification to the backend database while you are editing.
  • every new update can only be saved AFTER its previous saving is finished.

Therefore, if you make some editing, and BEFORE the modification is successfully persisted to the database you make another editing, the "sync failed" error will show.

You can do the followings to check which is the case:

  • Open the Developer Tools of Chrome, navigate to "Network" tab, and observe the requests.
  • Every synchronization will trigger a PATCH /api/video/:id/annotation request.
  • When you see the error,

    • if you find such a request with 409 response status code, it means someone else has modified the same video as well.
    • otherwise, you SHOULD not find such a request being triggered, and SHOULD find a previous request that is not finished yet. It may happen when your network is slow or your server machine is poor, (although I can not imagine it for AWS Seoul).

So, could you follow the above instruction, and let me know which case it is? It would be better if you can record a video with your nutsh and DevTools side by side while you edit something to trigger the error. Also, can you tell me your EC2 instance type?

Besides, I noticed https://prnr-task.p-e.kr/app/_/project/4 is publicly accessible, which means anyone can edit your dataset. You may add a layer of HTTP Basic Authentication (e.g. using Nginx) to protect your deployment.

If I check the error this evening, I will capture the information and send it to you. The EC2 instance is using free tier2.micro (30GB), Amazon Linux 2023. The reason for allowing public access was that we could not find a solution to the image loading problem, so we set it in the public state. I'm not a professional developer, so I'm studying hard.

Later, while using this tool, I will neatly organize what I thought needed improvement and what I want to add to the issue. It's a really light and easy-to-use tool, so I hope it helps in my ability.

hxhxhx88 commented 10 months ago

@prnr Thanks so much! Indeed this tool is still in its early stage and under active developing. Do let me know any troubles you run into or thoughts you have. I am glad to assist you in solving your problems, and help you with your work!

Concerning the current issue, it will be great if you can do more observations and send me the result. I have recorded a video for your reference.

https://github.com/SysCV/nutsh/assets/1113875/a96ac403-6924-4a7d-a709-89be803ab390