Closed Qian-Hao closed 4 years ago
@Qian-Hao
您好,试着用np.asarray(x, np.float)将number特征x做一下类型转换。
试过了,不行 这是我的数值特征的类型 followscore float16 personalscore float16
@Qian-Hao
是公开数据集吗 可不可以给我一些sample数据
是一个比赛数据,这个字段是user表里的
deviceid | guid | outertag | tag | level | personidentification | followscore | personalscore | gender |
---|---|---|---|---|---|---|---|---|
dd4f4cbcc9733f8de667a99b7f375b99 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
e9b1196a3fc0603c55614caf35c26ce5 | NaN | NaN | 天文_cs:7.456377740584219|地球_cs:6.9858308668480795|八路军_cs:5.250453651623817|印度_cs:4.1883886945352735|单机游戏_cs:3.2773980830343583|特工_cs:3.003985130386354|特种部队_cs:1.9417268603784004|漫画_cs:1.2540893214535427|宗教_cs:1.0566913143638677|网游_cs:1.0353152195689521|港台娱乐_cs:0.5161115318809109 | NaN | NaN | NaN | NaN | NaN |
@Qian-Hao NaN事先都处理了吗?
@Qian-Hao batch多大呢?
填充了
@Qian-Hao batch多大呢?
256
@Qian-Hao 可不可以做个实验,把batch size 改成1024跑一下看看报错不
@Qian-Hao 有没有用StandardScaler做归一化?
@Qian-Hao 可不可以做个实验,把batch size 改成1024跑一下看看报错不
试过了,不行;
@Qian-Hao 有没有用StandardScaler做归一化?
做了,应该是tensor类型的问题吧
@Qian-Hao
做了,应该是tensor类型的问题吧
是的。numpy.ndarray被转换成tensor的数据类型不对
@Qian-Hao
可不可以拷贝1000行左右数据到github gist上,然后把features定义告诉我。我开测试一下。 因为我无法重现错误,短时间无法知道具体哪步出错了
test.zip """连续值特征""" dense_features = ['followscore', 'personalscore']
@Qian-Hao
我测试了一下 , 并没有问题
import pandas as pd
import numpy as np
import prediction_flow
from prediction_flow.features import Number, Category, Sequence, Features
from prediction_flow.transformers.column import StandardScaler
from prediction_flow.pytorch.data import Dataset
from prediction_flow.pytorch import DeepFM
from prediction_flow.pytorch.functions import fit, predict, create_dataloader_fn
import torch
import torch.nn as n
import torch.optim as optim
import torch.nn.functional as F
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
print(prediction_flow.__version__)
df_data = pd.read_csv('./test.csv')
df_data = df_data.loc[
~(df_data.target.isnull() |
df_data.followscore.isnull() |
df_data.personalscore.isnull()), ['target', 'followscore', 'personalscore']]
train, valid = train_test_split(df_data, test_size=0.3)
number_features = [
Number('followscore', StandardScaler()),
Number('personalscore', StandardScaler()),
]
features, train_loader, valid_loader = create_dataloader_fn(
number_features, [], [], 64, train, 'target', valid)
model = DeepFM(features, 2, 16, (32, 16), final_activation='sigmoid', dropout=0.3)
loss_func = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)
fit(10, model, loss_func, optimizer,
train_loader, valid_loader, notebook=True, auxiliary_loss_rate=0.1)
Traceback (most recent call last): File "D:/项目/CTR/video-click-contest/src/model/Flow_DeepFM.py", line 272, in
fit(10, model, loss_func, optimizer, train_loader, valid_loader, notebook=True, auxiliary_loss_rate=0.1)
File "D:\项目\CTR\video-click-contest\prediction_flow\pytorch\functions.py", line 49, in fit
pred = model(batch)
File "D:\Programs\Anaconda\envs\python3.6\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "D:\项目\CTR\video-click-contest\prediction_flow\pytorch\deepfm.py", line 132, in forward
linear_concat = torch.cat(number_inputs, dim=1)
RuntimeError: Expected object of scalar type Float but got scalar type Long for sequence element 1 in sequence argument at position #1 'tensors'
在我自己的数据集上运行报错; 模型输入的构建方式应该是没有问题。