GitHub-HongweiZhang / prediction-flow

Deep-Learning based CTR models implemented by PyTorch
MIT License
250 stars 52 forks source link

RuntimeError: Expected object of scalar type Float but got scalar type Long for sequence element 1 in sequence argument at position #1 'tensors' #23

Closed Qian-Hao closed 4 years ago

Qian-Hao commented 4 years ago

Traceback (most recent call last): File "D:/项目/CTR/video-click-contest/src/model/Flow_DeepFM.py", line 272, in fit(10, model, loss_func, optimizer, train_loader, valid_loader, notebook=True, auxiliary_loss_rate=0.1) File "D:\项目\CTR\video-click-contest\prediction_flow\pytorch\functions.py", line 49, in fit pred = model(batch) File "D:\Programs\Anaconda\envs\python3.6\lib\site-packages\torch\nn\modules\module.py", line 547, in call result = self.forward(*input, **kwargs) File "D:\项目\CTR\video-click-contest\prediction_flow\pytorch\deepfm.py", line 132, in forward linear_concat = torch.cat(number_inputs, dim=1) RuntimeError: Expected object of scalar type Float but got scalar type Long for sequence element 1 in sequence argument at position #1 'tensors'

在我自己的数据集上运行报错; 模型输入的构建方式应该是没有问题。

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao

您好,试着用np.asarray(x, np.float)将number特征x做一下类型转换。

Qian-Hao commented 4 years ago

试过了,不行 这是我的数值特征的类型 followscore float16 personalscore float16

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao

是公开数据集吗 可不可以给我一些sample数据

Qian-Hao commented 4 years ago

是一个比赛数据,这个字段是user表里的 image

Qian-Hao commented 4 years ago
deviceid guid outertag tag level personidentification followscore personalscore gender
dd4f4cbcc9733f8de667a99b7f375b99 NaN NaN NaN NaN NaN NaN NaN NaN
e9b1196a3fc0603c55614caf35c26ce5 NaN NaN 天文_cs:7.456377740584219|地球_cs:6.9858308668480795|八路军_cs:5.250453651623817|印度_cs:4.1883886945352735|单机游戏_cs:3.2773980830343583|特工_cs:3.003985130386354|特种部队_cs:1.9417268603784004|漫画_cs:1.2540893214535427|宗教_cs:1.0566913143638677|网游_cs:1.0353152195689521|港台娱乐_cs:0.5161115318809109 NaN NaN NaN NaN NaN
GitHub-HongweiZhang commented 4 years ago

@Qian-Hao NaN事先都处理了吗?

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao batch多大呢?

Qian-Hao commented 4 years ago

填充了

Qian-Hao commented 4 years ago

@Qian-Hao batch多大呢?

256

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao 可不可以做个实验,把batch size 改成1024跑一下看看报错不

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao 有没有用StandardScaler做归一化?

Qian-Hao commented 4 years ago

@Qian-Hao 可不可以做个实验,把batch size 改成1024跑一下看看报错不

试过了,不行;

@Qian-Hao 有没有用StandardScaler做归一化?

做了,应该是tensor类型的问题吧

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao

做了,应该是tensor类型的问题吧

是的。numpy.ndarray被转换成tensor的数据类型不对

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao

可不可以拷贝1000行左右数据到github gist上,然后把features定义告诉我。我开测试一下。 因为我无法重现错误,短时间无法知道具体哪步出错了

Qian-Hao commented 4 years ago

test.zip """连续值特征""" dense_features = ['followscore', 'personalscore']

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao

我测试了一下 , 并没有问题

import pandas as pd
import numpy as np

import prediction_flow
from prediction_flow.features import Number, Category, Sequence, Features
from prediction_flow.transformers.column import StandardScaler

from prediction_flow.pytorch.data import Dataset
from prediction_flow.pytorch import DeepFM

from prediction_flow.pytorch.functions import fit, predict, create_dataloader_fn

import torch
import torch.nn as n
import torch.optim as optim
import torch.nn.functional as F
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

print(prediction_flow.__version__)

df_data = pd.read_csv('./test.csv')

df_data = df_data.loc[
    ~(df_data.target.isnull() |
      df_data.followscore.isnull() |
      df_data.personalscore.isnull()), ['target', 'followscore', 'personalscore']]

train, valid = train_test_split(df_data, test_size=0.3)

number_features = [
    Number('followscore', StandardScaler()),
    Number('personalscore', StandardScaler()),
]

features, train_loader, valid_loader = create_dataloader_fn(
    number_features, [], [], 64, train, 'target', valid)

model = DeepFM(features, 2, 16, (32, 16), final_activation='sigmoid', dropout=0.3)

loss_func = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)

fit(10, model, loss_func, optimizer,
    train_loader, valid_loader, notebook=True, auxiliary_loss_rate=0.1)