Closed greyeagle closed 7 months ago
@greyeagle Thank you for your kind words!
Could you check the ML backend from this branch - https://github.com/heartexlabs/label-studio-ml-backend/pull/72 ?
Also I would recommend to use label-studio==1.4.1rc3
(or LS master branch).
@makseq Thank you very much for your support! Using that branch, things look much better. After going through the necessary installation steps I first got a timeout trying to connect. But that was only because there was a bunch of things being downloaded in the back by the ML backend. Second try after that was finished I got a working connection. Now I do have one more issue but that may be more related to my lack of experience with label-studio. I labeled a few images and then tried to start training. I get an error however stating:
[ERROR] [label_studio_ml.model::_get_latest_job_result_from_workdir::166] The latest job result file ././pytorch_backend/1.1643893404/1644577981/job_result.json doesn't exist
That seems connected to another error:
Traceback (most recent call last):
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/exceptions.py", line 39, in exception_f
return f(*args, **kwargs)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/api.py", line 66, in _train
job = _manager.train(annotations, project, label_config, **params)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 372, in train
project, label_config, train_kwargs=kwargs, tasks=tasks)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 327, in train_script_wrapper
train_output = m.model.fit(data_stream, workdir, **train_kwargs)
File "/home/lars/Progs/label-studio/label_studio_ml_git/pytorch_backend/pytorch_transfer_learning.py", line 190, in fit
dataloader = DataLoader(dataset, shuffle=True, batch_size=batch_size)
File "/home/lars/Progs/label-studio/label_studio_ml_git/env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 213, in __init__
sampler = RandomSampler(dataset)
File "/home/lars/Progs/label-studio/label_studio_ml_git/env/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 94, in __init__
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0
What am I doing wrong?
@greyeagle Maybe you have to label more data with different labels (choices)?
@makseq Thanks again for your time! I did find an error that looks suspicious:
File "<path_to_label_studio_ml>/pytorch_transfer_learning.py", line 36, in get_transformed_image
with open(filepath, mode='rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/lars/.local/share/label-studio/media/upload/e2fb99c3-405065_134d6bSK62facc.jpg'
Looking into my file system I see that the missing file is located in
'/home/lars/.local/share/label-studio/media/upload/1/'
note the 1
as the task number.
In the image source set in LabelStudio it shows (only the first task shown her but there are all of them):
Source for task 1
{
"id": 1,
"data": {
"image": "/data/upload/1/e2fb99c3-405065_134d6bSK62facc.jpg"
},
"annotations": [
{
"id": 2,
"created_username": " xxx@yyy.zz, 1",
"created_ago": "1 week, 3 days",
"result": [
{
"value": {
"choices": [
"no dormers"
]
},
"id": "7W-Yf2dRlj",
"from_name": "choice",
"to_name": "image",
"type": "choices",
"origin": "manual"
}
],
"was_cancelled": false,
"ground_truth": false,
"created_at": "2022-02-03T13:13:42.140069Z",
"updated_at": "2022-02-11T11:25:19.632907Z",
"lead_time": 1170.11,
"task": 1,
"completed_by": 1,
"parent_prediction": null,
"parent_annotation": null
}
],
"predictions": []
}
Looks like something is going wrong with the folders? How and where can I adjust that?
Hi @greyeagle I have pushed latest fixes to https://github.com/heartexlabs/label-studio-ml-backend/pull/72 Could you please update your ML backend and try one more time?
This issue with project number was fixed
Hi @KonstantinKorotaev
Thank you very much for fixing this! I was out of the office for a few days so I tried first thing this morning. I did:
git pull
Delete the backend and then
label-studio-ml init --force pytorch_backend --script label_studio_ml/examples/pytorch_transfer_learning/pytorch_transfer_learning.py
to re-generate the backend.
The issue with the image path is gone indeed, so I assume the fix was applied. Unfortunately I get new errors now: In the "machine learning" setting once I push the "start training" button in the label_studio_ml terminal I get:
[2022-02-21 08:32:56,757] [ERROR] [label_studio_ml.model::get_result::56] Run directory ././pytorch_backend/INITIAL specified by model_version doesn't exist
Traceback (most recent call last):
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 54, in get_result
job_result = self.get_result_from_job_id(model_version)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 107, in get_result_from_job_id
result = self._get_result_from_job_id(job_id)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 183, in _get_result_from_job_id
raise IOError(f'Run directory {job_dir} specified by model_version doesn\'t exist')
OSError: Run directory ././pytorch_backend/INITIAL specified by model_version doesn't exist
Transfer learning with a full ConvNet finetuning
In the Label-Studio terminal I see:
[2022-02-21 07:30:47,737] [ml.models::predict_tasks::169] [WARNING] Prediction not created for project roof form (id=3, url=http://localhost:9090): 500 Server Error: INTERNAL SERVER ERROR for url: http://localhost:9090/predict
[21/Feb/2022 07:30:47] "GET /api/dm/tasks/157?project=2 HTTP/1.1" 200 1106
[2022-02-21 07:30:55,188] [ml.models::predict_tasks::169] [WARNING] Prediction not created for project roof form (id=3, url=http://localhost:9090): 500 Server Error: INTERNAL SERVER ERROR for url: http://localhost:9090/predict
If I go to the annotations jobs and open an image I see the following error in the label-studio-ml terminal:
Traceback (most recent call last):
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/exceptions.py", line 39, in exception_f
return f(*args, **kwargs)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/api.py", line 32, in _predict
predictions, model = _manager.predict(tasks, project, label_config, force_reload, try_fetch, **params)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 560, in predict
predictions = cls._current_model.model.predict(tasks, **kwargs)
File "/home/lars/Progs/label-studio/label_studio_ml_git/pytorch_backend/pytorch_transfer_learning.py", line 160, in predict
logits = self.model.predict(image_urls)
File "/home/lars/Progs/label-studio/label_studio_ml_git/pytorch_backend/pytorch_transfer_learning.py", line 99, in predict
return self.model(images).data.numpy()
File "/home/lars/Progs/label-studio/label_studio_ml_git/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/lars/Progs/label-studio/label_studio_ml_git/env/lib/python3.7/site-packages/torchvision/models/resnet.py", line 220, in forward
return self._forward_impl(x)
File "/home/lars/Progs/label-studio/label_studio_ml_git/env/lib/python3.7/site-packages/torchvision/models/resnet.py", line 203, in _forward_impl
x = self.conv1(x)
File "/home/lars/Progs/label-studio/label_studio_ml_git/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/lars/Progs/label-studio/label_studio_ml_git/env/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 349, in forward
return self._conv_forward(input, self.weight)
File "/home/lars/Progs/label-studio/label_studio_ml_git/env/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 346, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same
While in the label-studio terminal I get:
[21/Feb/2022 07:42:18] "GET /api/dm/tasks/120?project=2 HTTP/1.1" 200 1113
[2022-02-21 07:43:47,951] [ml.models::predict_tasks::169] [WARNING] Prediction not created for project roof form (id=3, url=http://localhost:9090): 500 Server Error: INTERNAL SERVER ERROR for url: http://localhost:9090/predict
[21/Feb/2022 07:43:47] "GET /api/dm/tasks/121?project=2 HTTP/1.1" 200 1113
Do I need to change any settings? Thanks for all your work!
Hi @greyeagle I have pushed fixes to https://github.com/heartexlabs/label-studio-ml-backend/pull/72 Could you please update example and check?
Hi @KonstantinKorotaev
Thank you for the further update. I have updated my installation with the following results:
1) In the file pytorch_transfer_learning.py
in line 99 it said
return self.model(images).data.numpy()
raising an error. I changed this to return self.model(images).cpu().data.numpy()
and that seemed to solve it. Maybe you can include that change if you deem it valid.
2) I still see issues with path and model names.
a) I needed to manually create the folder INITIAL
below my ml backend folder. I'm not certain this should be the case.
b) I get
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 186, in _get_result_from_job_id
raise IOError(f'Result file {result_file} specified by model_version doesn\'t exist')
OSError: Result file ././pytorch_backend/INITIAL/job_result.json specified by model_version doesn't exist
when I try to press "start training". c) On opening individual images I get
Collecting annotations...
Creating dataset...
Process Process-4:
Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 81, in job
result = model.process_event(event, data, job_id, additional_params)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 302, in process_event
train_output = self.fit((), event=event, data=data, job_id=job_id, **additional_params)
File "/home/lars/Progs/label-studio/label_studio_ml_git/pytorch_backend/pytorch_transfer_learning.py", line 190, in fit
dataloader = DataLoader(dataset, shuffle=True, batch_size=batch_size)
File "/home/lars/Progs/label-studio/label_studio_ml_git/env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 213, in __init__
sampler = RandomSampler(dataset)
File "/home/lars/Progs/label-studio/label_studio_ml_git/env/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 94, in __init__
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0
most likely related to
Traceback (most recent call last):
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 126, in get_result_from_last_job
result = self.get_result_from_job_id(job_id)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 107, in get_result_from_job_id
result = self._get_result_from_job_id(job_id)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 186, in _get_result_from_job_id
raise IOError(f'Result file {result_file} specified by model_version doesn\'t exist')
OSError: Result file ././pytorch_backend/1645614421/job_result.json specified by model_version doesn't exist
So do I need to set any folders or path for job_result.json
? This file is nowhere to be found so it is not generated. Or do I need to perform any other setting (export) in Label Studio for this to work?
Seems we're getting forward but not quite there. Thank you for all your effort.
Hi @greyeagle I have pushed fixes for your latest comment to https://github.com/heartexlabs/label-studio-ml-backend/pull/72 Could you please update example and check?
Hi @KonstantinKorotaev Thank you very much. I will test it as soon as I can. Will be Tuesday next week likely.
Hi @KonstantinKorotaev , I did a new test just now and the results are as follows: On pressing"Start Training" in the project settings I get (on the label studio ml terminal):
[2022-02-28 15:15:07,837] [INFO] [werkzeug::_log::225] 192.168.178.68 - - [28/Feb/2022 15:15:07] "POST /webhook HTTP/1.1" 201 -
[2022-02-28 15:15:07,850] [INFO] [werkzeug::_log::225] 192.168.178.68 - - [28/Feb/2022 15:15:07] "GET /health HTTP/1.1" 200 -
[2022-02-28 15:15:07,854] [ERROR] [label_studio_ml.model::get_result_from_last_job::128] 1646057677 job returns exception:
Traceback (most recent call last):
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 126, in get_result_from_last_job
result = self.get_result_from_job_id(job_id)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 108, in get_result_from_job_id
assert isinstance(result, dict)
AssertionError
Transfer learning with a full ConvNet finetuning
[2022-02-28 15:15:08,471] [ERROR] [label_studio_ml.model::get_result::56]
Traceback (most recent call last):
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 54, in get_result
job_result = self.get_result_from_job_id(model_version)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 108, in get_result_from_job_id
assert isinstance(result, dict)
AssertionError
Transfer learning with a full ConvNet finetuning
[2022-02-28 15:15:09,595] [INFO] [werkzeug::_log::225] 192.168.178.68 - - [28/Feb/2022 15:15:09] "POST /setup HTTP/1.1" 200 -
Collecting annotations...
Creating dataset with 0 images...
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 81, in job
result = model.process_event(event, data, job_id, additional_params)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 312, in process_event
train_output = self.fit((), event=event, data=data, job_id=job_id, **additional_params)
File "/home/lars/Progs/label-studio/label_studio_ml_git/pytorch_backend/pytorch_transfer_learning.py", line 190, in fit
dataloader = DataLoader(dataset, shuffle=True, batch_size=batch_size)
File "/home/lars/Progs/label-studio/label_studio_ml_git/env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 213, in __init__
sampler = RandomSampler(dataset)
File "/home/lars/Progs/label-studio/label_studio_ml_git/env/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 94, in __init__
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0
On switching from image to image in the labeling view I get:
[2022-02-28 15:17:44,045] [INFO] [werkzeug::_log::225] 192.168.178.68 - - [28/Feb/2022 15:17:44] "GET /health HTTP/1.1" 200 -
[2022-02-28 15:17:44,049] [ERROR] [label_studio_ml.model::get_result_from_last_job::128] 1646057707 job returns exception:
Traceback (most recent call last):
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 126, in get_result_from_last_job
result = self.get_result_from_job_id(job_id)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 108, in get_result_from_job_id
assert isinstance(result, dict)
AssertionError
[2022-02-28 15:17:44,050] [ERROR] [label_studio_ml.model::get_result_from_last_job::128] 1646057677 job returns exception:
Traceback (most recent call last):
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 126, in get_result_from_last_job
result = self.get_result_from_job_id(job_id)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 108, in get_result_from_job_id
assert isinstance(result, dict)
AssertionError
Transfer learning with a full ConvNet finetuning
[2022-02-28 15:17:45,622] [INFO] [werkzeug::_log::225] 192.168.178.68 - - [28/Feb/2022 15:17:45] "POST /setup HTTP/1.1" 200 -
[2022-02-28 15:17:45,647] [ERROR] [label_studio_ml.exceptions::exception_f::53] Traceback (most recent call last):
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/exceptions.py", line 39, in exception_f
return f(*args, **kwargs)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/api.py", line 32, in _predict
predictions, model = _manager.predict(tasks, project, label_config, force_reload, try_fetch, **params)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 570, in predict
predictions = cls._current_model.model.predict(tasks, **kwargs)
File "/home/lars/Progs/label-studio/label_studio_ml_git/pytorch_backend/pytorch_transfer_learning.py", line 160, in predict
logits = self.model.predict(image_urls)
File "/home/lars/Progs/label-studio/label_studio_ml_git/pytorch_backend/pytorch_transfer_learning.py", line 99, in predict
return self.model(images).to(device).data.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
Traceback (most recent call last):
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/exceptions.py", line 39, in exception_f
return f(*args, **kwargs)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/api.py", line 32, in _predict
predictions, model = _manager.predict(tasks, project, label_config, force_reload, try_fetch, **params)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 570, in predict
predictions = cls._current_model.model.predict(tasks, **kwargs)
File "/home/lars/Progs/label-studio/label_studio_ml_git/pytorch_backend/pytorch_transfer_learning.py", line 160, in predict
logits = self.model.predict(image_urls)
File "/home/lars/Progs/label-studio/label_studio_ml_git/pytorch_backend/pytorch_transfer_learning.py", line 99, in predict
return self.model(images).to(device).data.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Use Tensor.cpu() to copy the tensor to host memory first.
[2022-02-28 15:17:45,648] [INFO] [werkzeug::_log::225] 192.168.178.68 - - [28/Feb/2022 15:17:45] "POST /predict HTTP/1.1" 500 -
If I change that line to return self.model(images).to(device).data.cpu().numpy()
I get a new assertion error:
[2022-02-28 15:20:07,247] [INFO] [werkzeug::_log::225] 192.168.178.68 - - [28/Feb/2022 15:20:07] "GET /health HTTP/1.1" 200 -
[2022-02-28 15:20:07,251] [ERROR] [label_studio_ml.model::get_result_from_last_job::128] 1646057707 job returns exception:
Traceback (most recent call last):
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 126, in get_result_from_last_job
result = self.get_result_from_job_id(job_id)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 108, in get_result_from_job_id
assert isinstance(result, dict)
AssertionError
[2022-02-28 15:20:07,251] [ERROR] [label_studio_ml.model::get_result_from_last_job::128] 1646057677 job returns exception:
Traceback (most recent call last):
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 126, in get_result_from_last_job
result = self.get_result_from_job_id(job_id)
File "/home/lars/Progs/label-studio/label_studio_ml_git/label_studio_ml/model.py", line 108, in get_result_from_job_id
assert isinstance(result, dict)
AssertionError
Transfer learning with a full ConvNet finetuning
[2022-02-28 15:20:08,798] [INFO] [werkzeug::_log::225] 192.168.178.68 - - [28/Feb/2022 15:20:08] "POST /setup HTTP/1.1" 200 -
[2022-02-28 15:20:08,824] [INFO] [werkzeug::_log::225] 192.168.178.68 - - [28/Feb/2022 15:20:08] "POST /predict HTTP/1.1" 200 -
I'm wondering if we may have an issue with library versions somehow? Thank you for tracking this down!
Hi @KonstantinKorotaev , I just noticed the branch was merged. As I still do not get it to work, I did the following:
redis
so pip install redis
rq
so pip install rq
Backend ist created.
Trying to start the backend I get cannot import name 'json' from 'itsdangerous'
Tryed the solutions given here but no success: https://blogs.thebitx.com/index.php/2022/02/21/solved-importerror-cannot-import-name-json-from-itsdangerous/
What is the Python Version you are using? By the way, I am on Ubuntu 20.04 but I suppose that should not make any difference on a venv.
@greyeagle Hey, I'm facing the same issue on ubuntu 18.04 , using python 3.6 and 3.9 I will let you know if I ever get it to work
@ithinggoon In my venv, I was able to fix the "itsdangerous" problem by by following the blog directions that @greyeagle referenced above. In particular I downgraded Flask to 1.1.2 and markupsafe to 2.0.1 . I hope that helps!
@itswhts4dinner We've updated flask version up to 1.1.4: https://github.com/heartexlabs/label-studio-ml-backend/pull/96/files
I will try and test the new version. Thanks for keeping this going!
First of all, thank you for all your great work! I have a current installation of Label-Studio on Ubuntu 20.04. Wanting to classify images, I have followed the tutorial at https://labelstud.io/tutorials/pytorch-image-transfer-learning.html and got to the point where I could run the service and connect to Label-Studio, well sort of. However, it seems I am overlooking something in the last section
ImageClassifierAPI
. On trying to connect, the connection itself is established but I getin Label-Studio. From the service terminal I get
I have read in the tutorial that I need to overwrite something but then again - how? Looked like this was a working example.
resources
ist indeed not defined. How do I need to define this? Do I need to fetch attributes likesuper(ImageClassifierAPI, self).__init__(**kwargs)
and using the attributes described here: https://pypi.org/project/label-studio-ml/ ?Any help is highly appreciated. Thank you very much.