[Question] Speed up evaluate using large eval batch size

trannhuthuat96 commented 1 year ago

Hi Team,

Thanks for developing such a great library.

When using RecBole, I found that evaluation in full sort evaluation mode runs slowly. After reading the code, I found the reason from this line https://github.com/RUCAIBox/RecBole/blob/d3d421df394a231cd5baebd7a6bc9a05583c6642/recbole/data/dataloader/general_dataloader.py#L242 What is the purpose of re-calculating batch size? (My model is auto-encoder structured, so in training and evaluating, each batch of data contains only user_id)

In order to speed up evaluation, eval_batch_size is set to a large value, i.e., eval_batch_size = 512 * num_item, so that real eval batch size (in case of user-oriented dataloader) is 512. But different large eval_batch_size values result in different Recalls and NDCGs (same random seeds, same GPU machine). Could you help to explain this situation?

Thanks,

Wicknight commented 1 year ago

@trannhuthuat96 Hello, I'm sorry for the late reply. For question.1, we re-calculate eval batch size here because we need to adjust the batch size to a multiple of item_num so that we can load whole item set. For question.2, could you tell me which model had different results in the evaluation to help us find out the problem? This is not the case when I use the BPR model for experiments. If the above problems occurs in the model developed by yourself, please check whether there are related bugs in the code.

trannhuthuat96 commented 1 year ago

Hi @Wicknight,

Thanks for your response.

For question 1. just want to clarify that loading the whole item set means that we can fit item set in memory (in case memory is limited).

For question 2. I tried with MacridVAE. I ran multiple times with same large batch size on benchmark movilen-1M (data is provided by another paper for fair comparison) and achieved same results (Recall and NDCGs). But different large eval batch size values produced different results on the same dataset.

Thanks,

Wicknight commented 1 year ago

Hello @trannhuthuat96，

For question 1. We do this because for each user, we must evaluate all items to get results. Therefore, the ’num_item‘ is finally taken as the basic unit of eval batch size.

For question 2. It is verified by experiments that this problem does exist. At present, we are investigating the cause. If you have any suggestions, you are also welcome to put forward here.

LorenzoSun-V commented 1 year ago

Hi @Wicknight , I have recently trained Amazon_Electronic by DIN model and I also encountered this question.

Here is my config file:

# Atomic File Format
field_separator: "\t"           # (str) Separator of different columns in atomic files.
seq_separator: " "              # (str) Separator inside the sequence features.

# Dataset Information
data_path: '/local/lorenzo/Recommendation/RecBole/dataset/'
checkpoint_dir: '/local/lorenzo/Recommendation/RecBole/saved/Amazon_Electronics'

# training settings
epochs: 500 #训练的最大轮数
train_batch_size: 16384 #训练的batch_size
learner: adam #使用的pytorch内置优化器
learning_rate: 0.005 #学习率
training_neg_sample_num: 0 #负采样数目
eval_step: 1 #每次训练后做evalaution的次数
stopping_step: 10 #控制训练收敛的步骤数，在该步骤数内若选取的评测标准没有什么变化，就可以提前停止了
gpu_id: '0,1,2,3'

# evalution settings
eval_setting: RO_RS #对数据随机重排，设置按比例划分数据集
group_by_user: False #是否将一个user的记录划到一个组里
split_ratio: [0.95,0.01,0.04] #切分比例
metrics: ['AUC', 'LogLoss'] #评测标准
valid_metric: AUC #选取哪个评测标准作为作为提前停止训练的标准
eval_batch_size: 16384 #评测的batch_size
# topk: 500

# Benchmark .inter file
benchmark_filename: ~           # (list) List of pre-split user-item interaction suffix.
embedding_size: 10              # (int) The embedding size of features.
mlp_hidden_size: [256,256,256]  # (list of int) The hidden size of MLP layers. 
dropout_prob: 0.0               # (float) The dropout rate.                 
pooling_mode: 'mean'            # (str) Pooling mode of sequence data.

# Basic Information
USER_ID_FIELD: user_id          # (str) Field name of user ID feature.
ITEM_ID_FIELD: item_id          # (str) Field name of item ID feature.
RATING_FIELD: rating            # (str) Field name of rating feature.
TIME_FIELD: timestamp           # (str) Field name of timestamp feature.
seq_len: ~                      # (dict) Field name of sequence feature: maximum length of each sequence
LABEL_FIELD: label              # (str) Expected field name of the generated labels for point-wise dataLoaders. 
threshold: {rating: 3}                    # (dict) 0/1 labels will be generated according to the pairs.
NEG_PREFIX: neg_                # (str) Negative sampling prefix for pair-wise dataLoaders.
numerical_features: [sales_rank]          # (list) Float feature fields to be embedded

# Selectively Loading
load_col:                       # (dict) The suffix of atomic files: (list) field names to be loaded.
    inter: [user_id, item_id, rating, timestamp]
    item: [item_id, title, categories, brand, sales_type, sales_rank]
    # the others
unload_col: ~                   # (dict) The suffix of atomic files: (list) field names NOT to be loaded.
unused_col: ~                   # (dict) The suffix of atomic files: (list) field names which are loaded but not used.
additional_feat_suffix: ~       # (list) Control loading additional atomic files.

# Filtering
rm_dup_inter: ~                 # (str) Whether to remove duplicated user-item interactions.
val_interval: ~                 # (dict) Filter inter by values in {value field (str): interval (str)}.
filter_inter_by_user_or_item: True    # (bool) Whether or not to filter inter by user or item.
user_inter_num_interval: "[0,inf)"    # (str) User interval for filtering inter, such as [A,B] / [A,B) / (A,B) / (A,B].
item_inter_num_interval: "[0,inf)"    # (str) Item interval for filtering inter, such as [A,B] / [A,B) / (A,B) / (A,B].

# Preprocessing
alias_of_user_id: ~             # (list) Fields' names remapped into the same index system with USER_ID_FIELD.
alias_of_item_id: ~             # (list) Fields' names remapped into the same index system with ITEM_ID_FIELD.
alias_of_entity_id: ~           # (list) Fields' names remapped into the same index system with ENTITY_ID_FIELD.
alias_of_relation_id: ~         # (list) Fields' names remapped into the same index system with RELATION_ID_FIELD.
preload_weight: ~               # (dict) Preloaded weight in {IDs (token): pretrained vectors (float-like)}.
normalize_field: ~              # (list) List of filed names to be normalized.
normalize_all: ~                # (bool) Whether or not to normalize all the float like fields.
discretization: ~               # (dict) The discretization settings.

# Sequential Model Needed
ITEM_LIST_LENGTH_FIELD: item_length   # (str) Field name of the feature representing item sequences' length. 
LIST_SUFFIX: _list              # (str) Suffix of field names which are generated as sequences.
MAX_ITEM_LIST_LENGTH: 50        # (int) Maximum length of each generated sequence.
POSITION_FIELD: position_id     # (str) Field name of the generated position sequence.

# Knowledge-based Model Needed
HEAD_ENTITY_ID_FIELD: head_id   # (str) Field name of the head entity ID feature.
TAIL_ENTITY_ID_FIELD: tail_id   # (str) Field name of the tail entity ID feature.
RELATION_ID_FIELD: relation_id  # (str) Field name of the relation ID feature.
ENTITY_ID_FIELD: entity_id      # (str) Field name of the entity ID.
kg_reverse_r: False             # (bool) Whether to reverse relations of triples for bidirectional edges.
entity_kg_num_interval: "[0,inf)"       # (str) Entity interval for filtering kg.
relation_kg_num_interval: "[0,inf)"     # (str) Relation interval for filtering kg.

The tqdm bar showed it was going to take more than one hour to evaluate.

Do you have any ideas?

LorenzoSun-V commented 1 year ago

When I use 2 GPUs to train and eval instead of 4, the time of eval reduced to 15 mins. It's so weird..

RUCAIBox / RecBole

[Question] Speed up evaluate using large eval batch size #1556