Open beathahahaha opened 3 years ago
补充说明,main.py中的一个随机采样的问题:建议改写源代码如下 train_data = CriteoDataset('./data', train=True)
split_num = len(train_data) * 0.8 index_list = list(range(len(train_data))) train_idx, valid_idx = index_list[:split_num], index_list[split_num:]
tr_sampler = sampler.SubsetRandomSampler(train_idx) val_sampler = sampler.SubsetRandomSampler(valid_idx)
loader_train = DataLoader(train_data, batch_size=100,
sampler=tr_sampler)
loader_val = DataLoader(val_data, batch_size=100, sampler=val_sampler)
补充说明,main.py中的一个随机采样的问题:建议改写源代码如下 train_data = CriteoDataset('./data', train=True)
split_num = len(train_data) * 0.8 index_list = list(range(len(train_data))) train_idx, valid_idx = index_list[:split_num], index_list[split_num:]
tr_sampler = sampler.SubsetRandomSampler(train_idx) val_sampler = sampler.SubsetRandomSampler(valid_idx)
loader_train = DataLoader(train_data, batch_size=100, sampler=tr_sampler)
val_data = CriteoDataset('./data', train=True)
loader_val = DataLoader(val_data, batch_size=100, sampler=val_sampler)
I think here should be loader_val = DataLoader(train_data,batch_size.....) not val_data, because val_data is not defined.
一、 dataPreprocess.py代码,86行,num_train_sample = 10000,这里应该是1000000吧? 否则运行main.py报错IndexError: index 9999 is out of bounds for axis 0 with size 9999
二、 dataPreprocess.py代码的连续值处理的裁剪没有生效,(代码原因),也可以做个测试,修改continous_clip = [20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50],比如为0,会发现生成出来的train.txt中的连续值不会是0,即裁剪未生效