HumanSignal / label-studio-ml-backend

Configs and boilerplates for Label Studio's Machine Learning backend
Apache License 2.0
575 stars 260 forks source link

ner ml backend while training [job_result.json doesn't exist] [label_studio_ml.model::_get_latest_job_result_from_workdir::438] #342

Open Utkarsha666 opened 1 year ago

Utkarsha666 commented 1 year ago

Whenever I click start training, nothing happens, here is the logs.

`2023-08-30 14:26:02 server  | [2023-08-30 08:41:02,166] [ERROR] [label_studio_ml.exceptions::exception_f::53] Traceback (most recent call last):
2023-08-30 14:26:02 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/exceptions.py", line 39, in exception_f
2023-08-30 14:26:02 server  |     return f(*args, **kwargs)
2023-08-30 14:26:02 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/api.py", line 70, in _train
2023-08-30 14:26:02 server  |     job = _manager.train(annotations, project, label_config, **params)
2023-08-30 14:26:02 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/model.py", line 683, in train
2023-08-30 14:26:02 server  |     project, label_config, train_kwargs=kwargs, tasks=tasks)
2023-08-30 14:26:02 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/model.py", line 638, in train_script_wrapper
2023-08-30 14:26:02 server  |     train_output = m.model.fit(data_stream, workdir, **train_kwargs)
2023-08-30 14:26:02 server  |   File "./ner.py", line 474, in fit
2023-08-30 14:26:02 server  |     raise ValueError('Specify "WORK_DIR" environmental variable to store model checkpoints.')
2023-08-30 14:26:02 server  | ValueError: Specify "WORK_DIR" environmental variable to store model checkpoints.
2023-08-30 14:26:02 server  | 
2023-08-30 14:26:02 server  | Traceback (most recent call last):
2023-08-30 14:26:02 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/exceptions.py", line 39, in exception_f
2023-08-30 14:26:02 server  |     return f(*args, **kwargs)
2023-08-30 14:26:02 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/api.py", line 70, in _train
2023-08-30 14:26:02 server  |     job = _manager.train(annotations, project, label_config, **params)
2023-08-30 14:26:02 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/model.py", line 683, in train
2023-08-30 14:26:02 server  |     project, label_config, train_kwargs=kwargs, tasks=tasks)
2023-08-30 14:26:02 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/model.py", line 638, in train_script_wrapper
2023-08-30 14:26:02 server  |     train_output = m.model.fit(data_stream, workdir, **train_kwargs)
2023-08-30 14:26:02 server  |   File "./ner.py", line 474, in fit
2023-08-30 14:26:02 server  |     raise ValueError('Specify "WORK_DIR" environmental variable to store model checkpoints.')
2023-08-30 14:26:02 server  | ValueError: Specify "WORK_DIR" environmental variable to store model checkpoints.
2023-08-30 14:26:02 server  | 
2023-08-30 14:26:02 server  | [pid: 21|app: 0|req: 11/11] 172.18.0.1 () {36 vars in 739 bytes} [Wed Aug 30 08:41:02 2023] POST /train => generated 1048 bytes in 48 msecs (HTTP/1.1 500) 2 headers in 92 bytes (1 switches on core 0)
2023-08-30 14:26:02 server  | [pid: 21|app: 0|req: 12/12] 172.18.0.1 () {32 vars in 672 bytes} [Wed Aug 30 08:41:02 2023] GET /health => generated 54 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 71 bytes (1 switches on core 0)
2023-08-30 14:26:02 server  | [2023-08-30 08:41:02,378] [ERROR] [label_studio_ml.model::_get_latest_job_result_from_workdir::438] The latest job result file /data/models/43.1693383892/1693384862/job_result.json doesn't exist
2023-08-30 14:26:02 server  | [2023-08-30 08:41:02,382] [ERROR] [label_studio_ml.model::_get_latest_job_result_from_workdir::438] The latest job result file /data/models/43.1693383892/1693384618/job_result.json doesn't exist
2023-08-30 14:26:02 server  | [2023-08-30 08:41:02,386] [ERROR] [label_studio_ml.model::_get_latest_job_result_from_workdir::438] The latest job result file /data/models/43.1693383892/1693384588/job_result.json doesn't exist
2023-08-30 14:26:02 server  | [2023-08-30 08:41:02,389] [ERROR] [label_studio_ml.model::_get_latest_job_result_from_workdir::438] The latest job result file /data/models/43.1693383892/1693384048/job_result.json doesn't exist
2023-08-30 14:26:02 server  | [2023-08-30 08:41:02,392] [ERROR] [label_studio_ml.model::_get_latest_job_result_from_workdir::438] The latest job result file /data/models/43.1693383892/1693383970/job_result.json doesn't exist
2023-08-30 14:26:02 server  | [2023-08-30 08:41:02,395] [ERROR] [label_studio_ml.model::_get_latest_job_result_from_workdir::438] The latest job result file /data/models/43.1693383892/1693383965/job_result.json doesn't exist
2023-08-30 14:26:02 server  | [pid: 21|app: 0|req: 13/13] 172.18.0.1 () {36 vars in 724 bytes} [Wed Aug 30 08:41:02 2023] POST /setup => generated 31 bytes in 29 msecs (HTTP/1.1 200) 2 headers in 71 bytes (1 switches on core 0)`
Utkarsha666 commented 1 year ago

I have fixed the above error by adding the WORK_DIR variable, but now I am getting the error like this

`2023-08-30 15:00:12 server  | [2023-08-30 09:15:12,802] [ERROR] [label_studio_ml.exceptions::exception_f::53] Traceback (most recent call last):
2023-08-30 15:00:12 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/exceptions.py", line 39, in exception_f
2023-08-30 15:00:12 server  |     return f(*args, **kwargs)
2023-08-30 15:00:12 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/api.py", line 70, in _train
2023-08-30 15:00:12 server  |     job = _manager.train(annotations, project, label_config, **params)
2023-08-30 15:00:12 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/model.py", line 683, in train
2023-08-30 15:00:12 server  |     project, label_config, train_kwargs=kwargs, tasks=tasks)
2023-08-30 15:00:12 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/model.py", line 638, in train_script_wrapper
2023-08-30 15:00:12 server  |     train_output = m.model.fit(data_stream, workdir, **train_kwargs)
2023-08-30 15:00:12 server  |   File "./ner.py", line 490, in fit
2023-08-30 15:00:12 server  |     completions = self._get_annotated_dataset(data['project_id'])
2023-08-30 15:00:12 server  | TypeError: string indices must be integers
2023-08-30 15:00:12 server  | 
2023-08-30 15:00:12 server  | Traceback (most recent call last):
2023-08-30 15:00:12 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/exceptions.py", line 39, in exception_f
2023-08-30 15:00:12 server  |     return f(*args, **kwargs)
2023-08-30 15:00:12 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/api.py", line 70, in _train
2023-08-30 15:00:12 server  |     job = _manager.train(annotations, project, label_config, **params)
2023-08-30 15:00:12 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/model.py", line 683, in train
2023-08-30 15:00:12 server  |     project, label_config, train_kwargs=kwargs, tasks=tasks)
2023-08-30 15:00:12 server  |   File "/usr/local/lib/python3.7/site-packages/label_studio_ml/model.py", line 638, in train_script_wrapper
2023-08-30 15:00:12 server  |     train_output = m.model.fit(data_stream, workdir, **train_kwargs)
2023-08-30 15:00:12 server  |   File "./ner.py", line 490, in fit
2023-08-30 15:00:12 server  |     completions = self._get_annotated_dataset(data['project_id'])
2023-08-30 15:00:12 server  | TypeError: string indices must be integers
2023-08-30 15:00:12 server  | 
2023-08-30 15:00:12 server  | [pid: 21|app: 0|req: 7/7] 172.18.0.1 () {36 vars in 740 bytes} [Wed Aug 30 09:15:10 2023] POST /train => generated 936 bytes in 2484 msecs (HTTP/1.1 500) 2 headers in 91 bytes (1 switches on core 0)
2023-08-30 15:00:12 server  | [pid: 21|app: 0|req: 8/8] 172.18.0.1 () {32 vars in 672 bytes} [Wed Aug 30 09:15:12 2023] GET /health => generated 54 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 71 bytes (1 switches on core 0)
2023-08-30 15:00:12 server  | [2023-08-30 09:15:12,903] [ERROR] [label_studio_ml.model::_get_latest_job_result_from_workdir::438] The latest job result file /data/models/42.1693375653/1693386910/job_result.json doesn't exist
2023-08-30 15:00:12 server  | [pid: 21|app: 0|req: 9/9] 172.18.0.1 () {36 vars in 724 bytes} [Wed Aug 30 09:15:12 2023] POST /setup => generated 31 bytes in 22 msecs (HTTP/1.1 200) 2 headers in 71 bytes (1 switches on core 0)`
d3vopz-net commented 1 year ago

i have similiar issue https://github.com/HumanSignal/label-studio-ml-backend/issues/323