Open leoromanovich opened 5 months ago
@AlekseySh what do you think about implementation like that?
@leoromanovich by the way, we also need to update postprocessing pipeline
seems like get_loaders_with_embeddings
is already in a format which is very close to what we expect in builders registry, right?
@leoromanovich by the way, we also need to update postprocessing pipeline seems like
get_loaders_with_embeddings
is already in a format which is very close to what we expect in builders registry, right?
Looks like) I've added some changes, tests passed.)
@AlekseySh Check changes for reranking builder, please. What I don't like about current solution: Because of using feature extractor inside, we need to pass args for extractor (like precision, num_workers, bs_inference). Not critical, but I believe we need to change options, used in different places upper in config, if we decide, that reranking dataset builder approach is good enough.
based on offline discussion:
------------------------
PREDICT.YAML
precision: 32
accelerator: gpu
devices: 1
dataset:
name: BaseImgDataset
im_paths: ... # or im_dir
transforms_predict:
name: norm_resize_albu
args:
im_size: 224
save_dir: "."
bs: 64
num_workers: 10
extractor:
name: vit
args:
arch: vits16
normalise_features: False
use_multi_scale: False
weights: vits16_cars
hydra:
run:
dir: ${save_dir}
searchpath:
- pkg://oml.configs
job:
chdir: True
-------------------
VALIDATE.YAML
accelerator: gpu
devices: 1
precision: 32
bs_val: 256
num_workers: 8
val_dataset:
name: image_dataset
dataframe_name: df_with_bboxes.csv # df/path_to_df
args:
dataset_root: data/CARS196/
transforms_val:
name: norm_resize_albu
args:
im_size: 224
extractor:
name: vit
args:
arch: vits16
normalise_features: False
use_multi_scale: False
weights: vits16_cars
metric_args:
metrics_to_exclude_from_visualization: [cmc,]
cmc_top_k: [1, 5]
map_top_k: [5]
precision_top_k: [5]
fmr_vals: [0.01]
pcf_variance: [0.5, 0.9, 0.99]
return_only_overall_category: False
visualize_only_overall_category: True
hydra:
searchpath:
- pkg://oml.configs
job:
chdir: True
-----------------------------
REGISTRY.PY
REGISTRY_DATASETS = {
"image_dataset": ImageQGLDataset
}
def get_dataset_by_cfg(cfg, split_val=None):
if split_val and dataframe_name in cfg:
df = pd.read_csv(cfg["dataframe_name"])
df = df[df.split == split_val]
return REGISTRY_DATASETS["image_dataset"](df_path/df)
-------------------------------
PIPELINES.PY
train_dataset = get_dataset_by_cfg(cfg["train_dataset"], split_val="train")
val_dataset = get_dataset_by_cfg(cfg["valid_dataset"], split_val="validate")
Looks like we can't avoid a small builder, because of transforms initialisation and mapping. from this
REGISTRY_DATASETS = {
"image_dataset": ImageQGLDataset
}
def get_dataset_by_cfg(cfg, split_val=None):
if split_val and dataframe_name in cfg:
df = pd.read_csv(cfg["dataframe_name"])
df = df[df.split == split_val]
return REGISTRY_DATASETS["image_dataset"](df_path/df)
to something like that:
REGISTRY_DATASETS = {
"image_qg_dataset": qg_builder,
}
def qg_builder(args):
transforms = ...
dataset = QGDataset(....)
return dataset
def get_dataset_by_cfg(cfg, split_val=None):
if split_val and dataframe_name in cfg:
df = pd.read_csv(cfg["dataframe_name"])
mapper = {...}
df = df[df.split == split_val]
df = df[].map(mapper)
cfg['df'] = df # because not all datasets will have a dataframe, we pack it inside cfg.
return REGISTRY_DATASETS["image_qg_dataset"](df_path/df)
upd: offline decided to get transforms before dataset initialisation
For now, we don't push you to follow any predefined schema of issue, but ensure you've already read our contribution guide: https://open-metric-learning.readthedocs.io/en/latest/from_readme/contributing.html.