Closed L1aoXingyu closed 2 years ago
@L1aoXingyu I don't understand clearly your documentation for custom dataset. I've tried your way but got below error:
Traceback (most recent call last):
File "tools/train_net.py", line 67, in <module>
args=(args,),
File "./fastreid/engine/launch.py", line 71, in launch
main_func(*args)
File "tools/train_net.py", line 53, in main
trainer = Trainer(cfg)
File "./fastreid/engine/defaults.py", line 204, in __init__
data_loader = self.build_train_loader(cfg)
File "./fastreid/engine/defaults.py", line 408, in build_train_loader
return build_reid_train_loader(cfg)
File "./fastreid/data/build.py", line 27, in build_reid_train_loader
dataset = DATASET_REGISTRY.get(d)(root=_root, combineall=cfg.DATASETS.COMBINEALL)
File "./fastreid/product_dataset.py", line 8, in __init__
super().__init__(train, query, gallery)
NameError: name 'train' is not defined
And this is my product_dataset.py file in fastreid folder:
from fastreid.data.datasets import DATASET_REGISTRY
from fastreid.data.datasets.bases import ImageDataset
@DATASET_REGISTRY.register()
class ProductDataset(ImageDataset):
def __init__(self, root='datasets', **kwargs):
...
super().__init__(train, query, gallery)
Although I had removed ...
in the above file but got the same error.
The ProductDataset folder is in the datasets folder and has structure as following:
.
├── gallery
│ ├── data_38
│ ├── data_43
│ ├── data_68
│ ├── data_gro
│ └── data_grocery
├── query
│ ├── data_38
│ ├── data_43
│ ├── data_68
│ ├── data_gro
│ └── data_grocery
└── train
├── data_38
├── data_43
├── data_68
├── data_gro
└── data_grocery
And in the each child folder, has the structure as below (ex: train/data_38/):
.
├── 1
├── 10
├── 11
├── 12
├── 13
├── 14
├── 15
├── 16
├── 17
├── 18
├── 19
├── 2
├── 20
├── 21
├── 22
├── 23
├── 24
├── 25
├── 26
├── 27
├── 28
├── 29
├── 3
├── 30
├── 31
├── 32
├── 33
├── 34
├── 35
├── 36
├── 37
├── 38
├── 4
├── 5
├── 6
├── 7
├── 8
└── 9
In each above number folder has some images.
@L1aoXingyu I don't understand clearly your documentation for custom dataset. I've tried your way but got below error:
Traceback (most recent call last): File "tools/train_net.py", line 67, in <module> args=(args,), File "./fastreid/engine/launch.py", line 71, in launch main_func(*args) File "tools/train_net.py", line 53, in main trainer = Trainer(cfg) File "./fastreid/engine/defaults.py", line 204, in __init__ data_loader = self.build_train_loader(cfg) File "./fastreid/engine/defaults.py", line 408, in build_train_loader return build_reid_train_loader(cfg) File "./fastreid/data/build.py", line 27, in build_reid_train_loader dataset = DATASET_REGISTRY.get(d)(root=_root, combineall=cfg.DATASETS.COMBINEALL) File "./fastreid/product_dataset.py", line 8, in __init__ super().__init__(train, query, gallery) NameError: name 'train' is not defined
And this is my product_dataset.py file in fastreid folder:
from fastreid.data.datasets import DATASET_REGISTRY from fastreid.data.datasets.bases import ImageDataset @DATASET_REGISTRY.register() class ProductDataset(ImageDataset): def __init__(self, root='datasets', **kwargs): ... super().__init__(train, query, gallery)
Although I had removed
...
in the above file but got the same error. The ProductDataset folder is in the datasets folder and has structure as following:. ├── gallery │ ├── data_38 │ ├── data_43 │ ├── data_68 │ ├── data_gro │ └── data_grocery ├── query │ ├── data_38 │ ├── data_43 │ ├── data_68 │ ├── data_gro │ └── data_grocery └── train ├── data_38 ├── data_43 ├── data_68 ├── data_gro └── data_grocery
And in the each child folder, has the structure as below (ex: train/data_38/):
. ├── 1 ├── 10 ├── 11 ├── 12 ├── 13 ├── 14 ├── 15 ├── 16 ├── 17 ├── 18 ├── 19 ├── 2 ├── 20 ├── 21 ├── 22 ├── 23 ├── 24 ├── 25 ├── 26 ├── 27 ├── 28 ├── 29 ├── 3 ├── 30 ├── 31 ├── 32 ├── 33 ├── 34 ├── 35 ├── 36 ├── 37 ├── 38 ├── 4 ├── 5 ├── 6 ├── 7 ├── 8 └── 9
In each above number folder has some images.
I've solve the problems with my datasets. And the key was that, train, query or gallery
pass via super()__init__(train, query, gallery)
must be list of tuples. Each tuple has structure as (path/to/image/, pid, camid)
@AnhPC03 Yes, you are right! It doesn't matter how your data structure is. The key idea is preparing the train
, query
and gallery
as required and then pass via super().__init__(train, query, gallery)
.
@L1aoXingyu What is the purpose of formatting the train pid
and camid
values as strings instead of integers? It seems like the later would make more sense to be consistent with the formatting of the query and gallery sets
@addisonklinke When combining two or more datasets to train, the integers will be confusing because 0 will be different ids in different datasets.
@L1aoXingyu I see, that makes sense. Thank you for the clarification.
Another question I had is whether there are guidelines for splitting a dataset into train, query, and gallery subsets. Obviously, we want the identity IDs in train to be mutually exclusive with those in query and gallery in order to have an unbiased evaluation. However, when constructing query and gallery I am wondering...
Hello, I train the model on my own dataset, but I will be stuck in the process of data loading during training, that is, here, why is this?
def _try_put_index(self):
assert self._tasks_outstanding < 2 * self._num_workers
try:
index = self._next_index()
except StopIteration:
return
for _ in range(self._num_workers): # find the next active worker, if any
worker_queue_idx = next(self._worker_queue_idx_cycle)
if self._workers_status[worker_queue_idx]:
break
else:
# not found (i.e., didn't break)
return
hi,您好,我是否可以把自己的数据集图片重命名成market1501的格式放进market数据集文件夹中,使用market的配置直接训练?
@vicwer 这种方式也可以,但是还是推荐使用上面写的自定义数据集配置。
@L1aoXingyu I want to train your fast-reid repo for classification. And my dataset has following structure:
├── train
│ ├── beverage_bottle
│ ├── box
│ ├── candy_bag
│ ├── candy_jar
│ ├── cylinder
│ ├── instant_food_cup
│ ├── juice_box
│ └── tiny_candy
└── val
├── beverage_bottle
├── box
├── candy_bag
├── candy_jar
├── cylinder
├── instant_food_cup
├── juice_box
└── tiny_candy
In each child folder, are some images
I had written dataloader as below:
import os
from fastreid.data.datasets import DATASET_REGISTRY
from fastreid.data.datasets.bases import ImageDataset
@DATASET_REGISTRY.register()
class SuperClassDataset(ImageDataset):
def __init__(self, root='datasets', **kwargs):
train_path = root + '/super_class_dataset/train'
val_path = root + '/super_class_dataset/val'
gallery_path = root + '/super_class_dataset/train'
self.convert_labels = {
'beverage_bottle': 1,
'box': 2,
'candy_bag': 3,
'candy_jar': 4,
'cylinder': 5,
'instant_food_cup': 6,
'juice_box': 7,
'tiny_candy': 8,
}
train_data = self.get_data(train_path, 1)
val_data = self.get_data(val_path, 2)
gallery_data = self.get_data(gallery_path, 3)
super().__init__(train_data, val_data, gallery_data)
def get_data(self, path, cam_id):
data = []
absolute_path = os.path.join(path)
sub_1_dirs = os.listdir(absolute_path)
for sub_1_dir in sub_1_dirs:
sub_1_path = os.path.join(absolute_path, sub_1_dir)
if sub_1_dir == '.DS_Store':
continue
filenames = os.listdir(sub_1_path)
for filename in filenames:
if filename == '.DS_Store':
continue
filepath = os.path.join(sub_1_path, filename)
data.append((filepath, self.convert_labels[sub_1_dir], cam_id))
return data
I had use train dataset as role of query dataset, and val as role of test dataset. But when i was training, i got the error:
RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0
But when i printed, tensor a equaled tensor b in shape everytime. Could you give me suggestion in dataloader for classification?
按照你上面流程里配置完成之后,config.yml文件中的name 和 继承的MyOwnDataset类中的参数 ”datasetname“ 是不是要统一?
按照你上面流程里配置完成之后,config.yml文件中的name 和 继承的MyOwnDataset类中的参数 ”datasetname“ 是不是要统一?
配置文件里面的数据集名字需要和自己定义的数据集名字匹配,比如上面的例子,在 config 里面需要写成 "SuperClassDataset"
This guide explains how to train your own custom dataset with fastreid's data loaders.
Before You Start
Following Getting Started to setup the environment and install requirements.txt dependencies.
Train on Custom Dataset
Register your dataset (i.e., tell fastreid how to obtain your dataset). To let fastreid know how to obtain a dataset named "my_dataset", users need to implement a
Class
that inheritsfastreid.data.datasets.bases.ImageDataset
:from fastreid.data.datasets import DATASET_REGISTRY from fastreid.data.datasets.bases import ImageDataset @DATASET_REGISTRY.register() class MyOwnDataset(ImageDataset): def __init__(self, root='datasets', **kwargs): ... super().__init__(train, query, gallery)
Here, the snippet associates a dataset named "MyOwnDataset" with a class that processes train set, query set and gallery set and then pass to the baseClass. Then add a decorator to this class for registration. The class can do arbitrary things and should generate train list:
list(str, str, str)
, query list:list(str, int, int)
and gallery list:list(str, int, int)
as below.train_list = [ (train_path1, pid1, camid1), (train_path2, pid2, camid2), ...] query_list = [ (query_path1, pid1, camid1), (query_path2, pid2, camid2), ...] gallery_list = [ (gallery_path1, pid1, camid1), (gallery_path2, pid2, camid2), ...]
You can also pass an empty train_list to generate a "Testset" only with
super().__init__([], query, gallery)
. Notice: query and gallery sets could have the same camera views, but for each individual query identity, his/her gallery samples from the same camera are excluded. So if your dataset has no camera annotations, you can set all query identities camera number to0
and all gallery identities camera number to1
, then you can get the testing results.- Import your dataset. Aftre registering your own dataset, you need to import it in
train_net.py
to make it effective.from dataset_file import MyOwnDataset
Hello @L1aoXingyu. First of all, thank you for you amazing work! If I understand correctly, I can train FastReID to re-identify any custom object I want right? In my case, I need to be able to re-identify a certain fruit. So I just need a dataset containing images of that fruit, right?
Thank you for your contribution!
This guide explains how to train your own custom dataset with fastreid's data loaders.
Before You Start
Following Getting Started to setup the environment and install requirements.txt dependencies.
Train on Custom Dataset
Register your dataset (i.e., tell fastreid how to obtain your dataset). To let fastreid know how to obtain a dataset named "my_dataset", users need to implement a
Class
that inheritsfastreid.data.datasets.bases.ImageDataset
:from fastreid.data.datasets import DATASET_REGISTRY from fastreid.data.datasets.bases import ImageDataset @DATASET_REGISTRY.register() class MyOwnDataset(ImageDataset): def __init__(self, root='datasets', **kwargs): ... super().__init__(train, query, gallery)
Here, the snippet associates a dataset named "MyOwnDataset" with a class that processes train set, query set and gallery set and then pass to the baseClass. Then add a decorator to this class for registration. The class can do arbitrary things and should generate train list:
list(str, str, str)
, query list:list(str, int, int)
and gallery list:list(str, int, int)
as below.train_list = [ (train_path1, pid1, camid1), (train_path2, pid2, camid2), ...] query_list = [ (query_path1, pid1, camid1), (query_path2, pid2, camid2), ...] gallery_list = [ (gallery_path1, pid1, camid1), (gallery_path2, pid2, camid2), ...]
You can also pass an empty train_list to generate a "Testset" only with
super().__init__([], query, gallery)
. Notice: query and gallery sets could have the same camera views, but for each individual query identity, his/her gallery samples from the same camera are excluded. So if your dataset has no camera annotations, you can set all query identities camera number to0
and all gallery identities camera number to1
, then you can get the testing results.- Import your dataset. Aftre registering your own dataset, you need to import it in
train_net.py
to make it effective.from dataset_file import MyOwnDataset
Hello @L1aoXingyu. First of all, thank you for you amazing work! If I understand correctly, I can train FastReID to re-identify any custom object I want right? In my case, I need to be able to re-identify a certain fruit. So I just need a dataset containing images of that fruit, right?
Thank you for your contribution!
Yes, if you want to train a model for identifying different fruits, you can collect a dataset with different kinds of fruits and train on it.
ted, tensor a equaled tensor b in shape everytime
同样遇到这个bug了
@L1aoXingyu I want to train your fast-reid repo for classification. And my dataset has following structure:
├── train │ ├── beverage_bottle │ ├── box │ ├── candy_bag │ ├── candy_jar │ ├── cylinder │ ├── instant_food_cup │ ├── juice_box │ └── tiny_candy └── val ├── beverage_bottle ├── box ├── candy_bag ├── candy_jar ├── cylinder ├── instant_food_cup ├── juice_box └── tiny_candy In each child folder, are some images
I had written dataloader as below:
import os from fastreid.data.datasets import DATASET_REGISTRY from fastreid.data.datasets.bases import ImageDataset @DATASET_REGISTRY.register() class SuperClassDataset(ImageDataset): def __init__(self, root='datasets', **kwargs): train_path = root + '/super_class_dataset/train' val_path = root + '/super_class_dataset/val' gallery_path = root + '/super_class_dataset/train' self.convert_labels = { 'beverage_bottle': 1, 'box': 2, 'candy_bag': 3, 'candy_jar': 4, 'cylinder': 5, 'instant_food_cup': 6, 'juice_box': 7, 'tiny_candy': 8, } train_data = self.get_data(train_path, 1) val_data = self.get_data(val_path, 2) gallery_data = self.get_data(gallery_path, 3) super().__init__(train_data, val_data, gallery_data) def get_data(self, path, cam_id): data = [] absolute_path = os.path.join(path) sub_1_dirs = os.listdir(absolute_path) for sub_1_dir in sub_1_dirs: sub_1_path = os.path.join(absolute_path, sub_1_dir) if sub_1_dir == '.DS_Store': continue filenames = os.listdir(sub_1_path) for filename in filenames: if filename == '.DS_Store': continue filepath = os.path.join(sub_1_path, filename) data.append((filepath, self.convert_labels[sub_1_dir], cam_id)) return data
I had use train dataset as role of query dataset, and val as role of test dataset. But when i was training, i got the error:
RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0
But when i printed, tensor a equaled tensor b in shape everytime. Could you give me suggestion in dataloader for classification?
@AnhPC03 Did you solved this issue? It would be helpful for me if you can guide me through the error
This issue is stale because it has been open for 30 days with no activity.
@AnhPC03 can you please tell us how did you solved this issue?| Is there a typical percent of the dataset used for these? i.e. 75% train / 25% query + gallery ? should we need to add same images in train query gallary ?
@shreejalt @akashAD98 Did you guys get this error RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0
, right?
If yes, please check that if images in your dataset had alpha channel? And then remove the alpha channel, only keep B,G,R channels.
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.
@L1aoXingyu I want to train your fast-reid repo for classification. And my dataset has following structure:
├── train │ ├── beverage_bottle │ ├── box │ ├── candy_bag │ ├── candy_jar │ ├── cylinder │ ├── instant_food_cup │ ├── juice_box │ └── tiny_candy └── val ├── beverage_bottle ├── box ├── candy_bag ├── candy_jar ├── cylinder ├── instant_food_cup ├── juice_box └── tiny_candy In each child folder, are some images
I had written dataloader as below:
import os from fastreid.data.datasets import DATASET_REGISTRY from fastreid.data.datasets.bases import ImageDataset @DATASET_REGISTRY.register() class SuperClassDataset(ImageDataset): def __init__(self, root='datasets', **kwargs): train_path = root + '/super_class_dataset/train' val_path = root + '/super_class_dataset/val' gallery_path = root + '/super_class_dataset/train' self.convert_labels = { 'beverage_bottle': 1, 'box': 2, 'candy_bag': 3, 'candy_jar': 4, 'cylinder': 5, 'instant_food_cup': 6, 'juice_box': 7, 'tiny_candy': 8, } train_data = self.get_data(train_path, 1) val_data = self.get_data(val_path, 2) gallery_data = self.get_data(gallery_path, 3) super().__init__(train_data, val_data, gallery_data) def get_data(self, path, cam_id): data = [] absolute_path = os.path.join(path) sub_1_dirs = os.listdir(absolute_path) for sub_1_dir in sub_1_dirs: sub_1_path = os.path.join(absolute_path, sub_1_dir) if sub_1_dir == '.DS_Store': continue filenames = os.listdir(sub_1_path) for filename in filenames: if filename == '.DS_Store': continue filepath = os.path.join(sub_1_path, filename) data.append((filepath, self.convert_labels[sub_1_dir], cam_id)) return data
I had use train dataset as role of query dataset, and val as role of test dataset. But when i was training, i got the error:
RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0
But when i printed, tensor a equaled tensor b in shape everytime. Could you give me suggestion in dataloader for classification?
Hey @AnhPC03, I know its been a few years but I am trying to build a classifer like you, but I can't get it working. Did you ever run into an issue where the trainer stalls forever without erroring out?
我想训练你的快速 reid 存储库进行分类。我的数据集具有以下结构:
├── train │ ├── beverage_bottle │ ├── box │ ├── candy_bag │ ├── candy_jar │ ├── cylinder │ ├── instant_food_cup │ ├── juice_box │ └── tiny_candy └── val ├── beverage_bottle ├── box ├── candy_bag ├── candy_jar ├── cylinder ├── instant_food_cup ├── juice_box └── tiny_candy In each child folder, are some images
我写了数据加载器如下:
import os from fastreid.data.datasets import DATASET_REGISTRY from fastreid.data.datasets.bases import ImageDataset @DATASET_REGISTRY.register() class SuperClassDataset(ImageDataset): def __init__(self, root='datasets', **kwargs): train_path = root + '/super_class_dataset/train' val_path = root + '/super_class_dataset/val' gallery_path = root + '/super_class_dataset/train' self.convert_labels = { 'beverage_bottle': 1, 'box': 2, 'candy_bag': 3, 'candy_jar': 4, 'cylinder': 5, 'instant_food_cup': 6, 'juice_box': 7, 'tiny_candy': 8, } train_data = self.get_data(train_path, 1) val_data = self.get_data(val_path, 2) gallery_data = self.get_data(gallery_path, 3) super().__init__(train_data, val_data, gallery_data) def get_data(self, path, cam_id): data = [] absolute_path = os.path.join(path) sub_1_dirs = os.listdir(absolute_path) for sub_1_dir in sub_1_dirs: sub_1_path = os.path.join(absolute_path, sub_1_dir) if sub_1_dir == '.DS_Store': continue filenames = os.listdir(sub_1_path) for filename in filenames: if filename == '.DS_Store': continue filepath = os.path.join(sub_1_path, filename) data.append((filepath, self.convert_labels[sub_1_dir], cam_id)) return data
我使用训练数据集作为查询数据集的角色,并使用 val 作为测试数据集的角色。但是当我在训练时,我得到了错误:
RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0
但是当我打印时,张量 a 每次都等于张量 b。你能给我在数据加载器中分类的建议吗?
嘿,我知道已经有几年了,但我正在尝试建立一个像你这样的分类器,但我无法让它工作。你有没有遇到过训练器永远失速而没有出错的问题?
你好,请问您解决了这个问题吗?,我现在也是训练不报错,但是卡死不运行状态
@L1aoXingyu I don't understand clearly your documentation for custom dataset. I've tried your way but got below error:
Traceback (most recent call last): File "tools/train_net.py", line 67, in <module> args=(args,), File "./fastreid/engine/launch.py", line 71, in launch main_func(*args) File "tools/train_net.py", line 53, in main trainer = Trainer(cfg) File "./fastreid/engine/defaults.py", line 204, in __init__ data_loader = self.build_train_loader(cfg) File "./fastreid/engine/defaults.py", line 408, in build_train_loader return build_reid_train_loader(cfg) File "./fastreid/data/build.py", line 27, in build_reid_train_loader dataset = DATASET_REGISTRY.get(d)(root=_root, combineall=cfg.DATASETS.COMBINEALL) File "./fastreid/product_dataset.py", line 8, in __init__ super().__init__(train, query, gallery) NameError: name 'train' is not defined
And this is my product_dataset.py file in fastreid folder:
from fastreid.data.datasets import DATASET_REGISTRY from fastreid.data.datasets.bases import ImageDataset @DATASET_REGISTRY.register() class ProductDataset(ImageDataset): def __init__(self, root='datasets', **kwargs): ... super().__init__(train, query, gallery)
Although I had removed
...
in the above file but got the same error. The ProductDataset folder is in the datasets folder and has structure as following:. ├── gallery │ ├── data_38 │ ├── data_43 │ ├── data_68 │ ├── data_gro │ └── data_grocery ├── query │ ├── data_38 │ ├── data_43 │ ├── data_68 │ ├── data_gro │ └── data_grocery └── train ├── data_38 ├── data_43 ├── data_68 ├── data_gro └── data_grocery
And in the each child folder, has the structure as below (ex: train/data_38/):
. ├── 1 ├── 10 ├── 11 ├── 12 ├── 13 ├── 14 ├── 15 ├── 16 ├── 17 ├── 18 ├── 19 ├── 2 ├── 20 ├── 21 ├── 22 ├── 23 ├── 24 ├── 25 ├── 26 ├── 27 ├── 28 ├── 29 ├── 3 ├── 30 ├── 31 ├── 32 ├── 33 ├── 34 ├── 35 ├── 36 ├── 37 ├── 38 ├── 4 ├── 5 ├── 6 ├── 7 ├── 8 └── 9
In each above number folder has some images.
Hi, how did you name the images ?
This guide explains how to train your own custom dataset with fastreid's data loaders.
Before You Start
Following Getting Started to setup the environment and install requirements.txt dependencies.
Train on Custom Dataset
Register your dataset (i.e., tell fastreid how to obtain your dataset).
To let fastreid know how to obtain a dataset named "my_dataset", users need to implement a
Class
that inheritsfastreid.data.datasets.bases.ImageDataset
:Here, the snippet associates a dataset named "MyOwnDataset" with a class that processes train set, query set and gallery set and then pass to the baseClass. Then add a decorator to this class for registration.
The class can do arbitrary things and should generate train list:
list(str, str, str)
, query list:list(str, int, int)
and gallery list:list(str, int, int)
as below.You can also pass an empty train_list to generate a "Testset" only with
super().__init__([], query, gallery)
.Notice: query and gallery sets could have the same camera views, but for each individual query identity, his/her gallery samples from the same camera are excluded. So if your dataset has no camera annotations, you can set all query identities camera number to
0
and all gallery identities camera number to1
, then you can get the testing results.Import your dataset.
Aftre registering your own dataset, you need to import it in
train_net.py
to make it effective.