Closed cardboardcode closed 2 years ago
Running the training workflow on the ros-industrial EPD v0.2.2
yields the same error message.
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
2022-08-12 10:59:03,245 maskrcnn_benchmark.utils.miscellaneous INFO: Saving labels mapping into ./weights/custom/labels.json
Traceback (most recent call last):
File "tools/train_net.py", line 201, in <module>
main()
File "tools/train_net.py", line 194, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 72, in train
start_iter=arguments["iteration"],
File "/home/cardboardvoice/anaconda3/envs/p3_trainer/lib/python3.6/site-packages/maskrcnn_benchmark-0.1-py3.6-linux-x86_64.egg/maskrcnn_benchmark/data/build.py", line 164, in make_data_loader
sampler = make_data_sampler(dataset, shuffle, is_distributed)
File "/home/cardboardvoice/anaconda3/envs/p3_trainer/lib/python3.6/site-packages/maskrcnn_benchmark-0.1-py3.6-linux-x86_64.egg/maskrcnn_benchmark/data/build.py", line 64, in make_data_sampler
sampler = torch.utils.data.sampler.RandomSampler(dataset)
File "/home/cardboardvoice/anaconda3/envs/p3_trainer/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 94, in __init__
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0
The integration of the .yaml
parser is NOT the root cause.
It can be deduced that the cause should be narrowed to unknown dependency conflicts. For a long-term solution to this problem, we will be looking at progress done under #4 and #3.
The conclusion from the previous Debug Update is further reinforced by the following test.
Unreleased dockerized training proceeds without the aforementioned error. Adhering to expected behaviour.
Unreleased dockerized exporter proceeds without the aforementioned error as well.
The cause is due to yet another hidden dependency conflict which is prevented once dockerized.
Aiming to close this issue under v0.3.0
- Minor Pull Request. Will link this once the pull request is started.
This issue is resolved with https://github.com/ros-industrial/easy_perception_deployment/pull/56. Closing.
Issue Description
Encountered the following error when attempting to train a Precision Level 3 MaskRCNN model using EPD. This error comes after having integrated the
.yaml
parser withinP3Trainer.py
.Expected Behaviour
The training is supposed to proceed without any errors.
Actual Behaviour
The training fails the aforementioned error in terminal.
Error Source
Currently, the integration of the
.yaml
parser inP3Trainer.py
seems to be the root cause.[ Update as of 20220812 ]: The integration of the parser is not the root cause. With the EPD v
0.2.2
P3 training workflow failing as well. It can be deduced that the cause should be narrowed to unknown dependency conflicts.