RuntimeError: Expected object of scalar type Float but got scalar type Long for sequence element 1 in sequence argument at position #1 'tensors'

Qian-Hao commented 4 years ago

Traceback (most recent call last): File "D:/项目/CTR/video-click-contest/src/model/Flow_DeepFM.py", line 272, in fit(10, model, loss_func, optimizer, train_loader, valid_loader, notebook=True, auxiliary_loss_rate=0.1) File "D:\项目\CTR\video-click-contest\prediction_flow\pytorch\functions.py", line 49, in fit pred = model(batch) File "D:\Programs\Anaconda\envs\python3.6\lib\site-packages\torch\nn\modules\module.py", line 547, in call result = self.forward(*input, **kwargs) File "D:\项目\CTR\video-click-contest\prediction_flow\pytorch\deepfm.py", line 132, in forward linear_concat = torch.cat(number_inputs, dim=1) RuntimeError: Expected object of scalar type Float but got scalar type Long for sequence element 1 in sequence argument at position #1 'tensors'

在我自己的数据集上运行报错；模型输入的构建方式应该是没有问题。

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao

您好，试着用np.asarray(x, np.float)将number特征x做一下类型转换。

Qian-Hao commented 4 years ago

试过了，不行这是我的数值特征的类型 followscore float16 personalscore float16

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao

是公开数据集吗可不可以给我一些sample数据

Qian-Hao commented 4 years ago

是一个比赛数据，这个字段是user表里的

Qian-Hao commented 4 years ago

deviceid	guid	outertag	tag	level	personidentification	followscore	personalscore	gender
dd4f4cbcc9733f8de667a99b7f375b99	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
e9b1196a3fc0603c55614caf35c26ce5	NaN	NaN	天文_cs:7.456377740584219\|地球_cs:6.9858308668480795\|八路军_cs:5.250453651623817\|印度_cs:4.1883886945352735\|单机游戏_cs:3.2773980830343583\|特工_cs:3.003985130386354\|特种部队_cs:1.9417268603784004\|漫画_cs:1.2540893214535427\|宗教_cs:1.0566913143638677\|网游_cs:1.0353152195689521\|港台娱乐_cs:0.5161115318809109	NaN	NaN	NaN	NaN	NaN

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao NaN事先都处理了吗？

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao batch多大呢？

Qian-Hao commented 4 years ago

填充了

Qian-Hao commented 4 years ago

@Qian-Hao batch多大呢？

256

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao 可不可以做个实验，把batch size 改成1024跑一下看看报错不

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao 有没有用StandardScaler做归一化？

Qian-Hao commented 4 years ago

@Qian-Hao 可不可以做个实验，把batch size 改成1024跑一下看看报错不

试过了，不行；

@Qian-Hao 有没有用StandardScaler做归一化？

做了，应该是tensor类型的问题吧

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao

做了，应该是tensor类型的问题吧

是的。numpy.ndarray被转换成tensor的数据类型不对

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao

可不可以拷贝1000行左右数据到github gist上，然后把features定义告诉我。我开测试一下。因为我无法重现错误，短时间无法知道具体哪步出错了

Qian-Hao commented 4 years ago

test.zip """连续值特征""" dense_features = ['followscore', 'personalscore']

GitHub-HongweiZhang commented 4 years ago

@Qian-Hao

我测试了一下，并没有问题

import pandas as pd
import numpy as np

import prediction_flow
from prediction_flow.features import Number, Category, Sequence, Features
from prediction_flow.transformers.column import StandardScaler

from prediction_flow.pytorch.data import Dataset
from prediction_flow.pytorch import DeepFM

from prediction_flow.pytorch.functions import fit, predict, create_dataloader_fn

import torch
import torch.nn as n
import torch.optim as optim
import torch.nn.functional as F
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

print(prediction_flow.__version__)

df_data = pd.read_csv('./test.csv')

df_data = df_data.loc[
    ~(df_data.target.isnull() |
      df_data.followscore.isnull() |
      df_data.personalscore.isnull()), ['target', 'followscore', 'personalscore']]

train, valid = train_test_split(df_data, test_size=0.3)

number_features = [
    Number('followscore', StandardScaler()),
    Number('personalscore', StandardScaler()),
]

features, train_loader, valid_loader = create_dataloader_fn(
    number_features, [], [], 64, train, 'target', valid)

model = DeepFM(features, 2, 16, (32, 16), final_activation='sigmoid', dropout=0.3)

loss_func = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)

fit(10, model, loss_func, optimizer,
    train_loader, valid_loader, notebook=True, auxiliary_loss_rate=0.1)

GitHub-HongweiZhang / prediction-flow

RuntimeError: Expected object of scalar type Float but got scalar type Long for sequence element 1 in sequence argument at position #1 'tensors' #23