The input of 'fc' layer must be matrix

youan1 commented 7 years ago

错误如题，相应脚本和数据已经提供，搜索到俩篇同类型issue

https://github.com/PaddlePaddle/Paddle/issues/708，解决办法是将integer_value_sequence改为sparse_non_value_slot ，但查文档后发现sparse_non_value_slot和sparse_binary_vector描述完全一样，而sparse_binary_vector在上一个issue中paddle同学已经不建议使用

https://github.com/PaddlePaddle/Paddle/issues/331，解决办法为It means that input.value is nullptr. So check you data and dataprovider，这个是否意味着输入slot不能为null。如果数据slot为空（存在这种情况），该如何处理

另外该错误是否与别的slot类型配置错误有关，目前数据输入有三种，参考文档后分别使用下面的数据类型1）固定长度稠密数据，使用的是dense_vector，2）稀疏无序数据，比如关键词的词袋类型，使用的 integer_value_sequence, 用于做embedding 3) 稀疏无序带权数据，如37:0.768 43:0.656等，使用的是 sparse_vector，连接fc layer，请问这样使用是否合理

谢谢！

qingqing01 commented 7 years ago

这个是否意味着输入slot不能为null。如果数据slot为空（存在这种情况），该如何处理

dense_vector：需要填充全0 integer_value_sequence：我们的程序需要自动处理，这块待检查下。 sparse_vector：按理我们的程序也应该要自动处理，同样需要检查下。如果你的应用里确实存在为空的数据，目前运行有问题的话，建议补一个数据(0,0.0)，标识权重为0，意味着这条数据无意义，我觉得是可以的。

1）固定长度稠密数据，使用的是dense_vector，2）稀疏无序数据，比如关键词的词袋类型，使用的integer_value_sequence, 用于做embedding 3) 稀疏无序带权数据，如37:0.768 43:0.656等，使用的是 sparse_vector，连接fc layer，请问这样使用是否合理

这个你理解没有问题。

The input of 'fc' layer must be matrix

这个问题，可以首先确认是哪个fc，然后检查该fc的输入，错误可能的情况是：1)输入为空； 2) 输入格式不正确，比如将integer_value_sequence的数据直接接入fc层。

newaccount3 commented 7 years ago

@qingqing01 ，感谢解答，

dense_vector：这个是保证填充全0了的， integer_value_sequence：我们的程序需要自动处理，这块待检查下。 sparse_vector：按理我们的程序也应该要自动处理，同样需要检查下。

这里您说的检查下是什么意思，我检查么，怎么检查，另外确实会有为空的稀疏slot，这个是否就是您说的在数据读入的时候检查，如果该稀疏slot为空，就补0，如果是稀疏带权slot为空，就补0.0

The input of 'fc' layer must be matrix 这个问题，可以首先确认是哪个fc，然后检查该fc的输入，错误可能的情况是：1)输入为空； 2) 输入格式不正确，比如将integer_value_sequence的数据直接接入fc层。

这里，1）有什么好的办法确定是哪个fc么， 2）如果有integer_value_sequence维度很小（200维以内），不需要embedding ，不能直接连接到fc的

qingqing01 commented 7 years ago

integer_value_sequence：我们的程序需要自动处理，这块待检查下。 sparse_vector：按理我们的程序也应该要自动处理，同样需要检查下。

这个需要我们开发人员检查，我们随后会检查下这个问题，建议稀疏带权数据先补0.0吧，

但如果是sparse_binary_vector不能补0，0代表：特征在index为0处有词(word)，权重为1，如果你没有sparse_binary_vector特征的数据可以先不用管这个。

1）有什么好的办法确定是哪个fc么，

出错日志栈信息应该有打印出fc层的名字吧，查看下log看看有没有。

newaccount3 commented 7 years ago

@qingqing01 ，好的，我试试您说的方法

zhangweijiqn commented 6 years ago

遇到同样的问题，稀疏带权特征，离线使用sparse_vector接入fc： uipf = paddle.layer.data("uipf_slot",paddle.data_type.sparse_vector(conf['input']['uipf_sz'])) 线上使用CAPI参考使用的是sparse_binary的demo：

paddle_matrix_sparse_copy_from(mat,
                                   rowBuf,
                                   row_sz,
                                   colBuf,
                                   col_sz,
                                   valueArray,
                                   col_sz);

特征有值仍出现上面的错误：

F1218 14:01:29.162410 20824 FullyConnectedLayer.cpp:85] Check failed: input.value The input of 'fc' layer must be matrix
    *** Check failure stack trace: ***
    st ret 0
    Thread [139989125940992] Forwarding emb_uipf,

jacquesqiao commented 6 years ago

@zhangweijiqn 你这个是训练时候没问题，预测有问题？

zhangweijiqn commented 6 years ago

@jacquesqiao 对

jacquesqiao commented 6 years ago

@zhangweijiqn 能贴一下你的代码么. https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/matrix.h#L48

/**
 * @brief paddle_matrix_create_sparse Create a sparse matrix.
 * @param height the matrix height.
 * @param width the matrix width.
 * @param nnz the number of non-zero elements.
 * @param isBinary is binary (either 1 or 0 in matrix) or not.
 * @param useGpu is using GPU or not.
 * @return paddle_matrix.
 * @note Mobile inference does not support this interface.
 */
PD_API paddle_matrix paddle_matrix_create_sparse(
    uint64_t height, uint64_t width, uint64_t nnz, bool isBinary, bool useGpu);

这个接口，注意一下isBinary这个接口，sparse_vector需要设置成false。

还有这个：https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/capi/matrix.h#L122 拷贝数据的时候需要设置valueArray。

/**
 * @brief paddle_matrix_sparse_copy_from Copy from a CSR format matrix
 * @param [out] mat output matrix
 * @param [in] rowArray row array. The array slices in column array.
 * @param [in] rowSize length of row array.
 * @param [in] colArray the column array. It means the non-zero element indices
 * in each row.
 * @param [in] colSize length of column array.
 * @param [in] valueArray the value array. It means the non-zero elemnt values.
 * NULL if the matrix is binary.
 * @param [in] valueSize length of value array. Zero if the matrix is binary.
 * @return paddle_error
 * @note Mobile inference does not support this interface.
 */
PD_API paddle_error paddle_matrix_sparse_copy_from(paddle_matrix mat,
                                                   int* rowArray,
                                                   uint64_t rowSize,
                                                   int* colArray,
                                                   uint64_t colSize,
                                                   float* valueArray,
                                                   uint64_t valueSize);

zhangweijiqn commented 6 years ago

@jacquesqiao isBinary和valueArray都设置了，代码如下：

int _set_sparse_float_slot(
        paddle_arguments& in_args,
        int slot_idx,
        int max_sz,
        IdxMapping& mapping,
        SparsefFeat& ffeat) {

    int row_sz = 1;
    int col_sz = ffeat.size();
    if (col_sz == 0) {
        col_sz = 1; // make sure element exists
    }
    paddle_matrix mat = paddle_matrix_create_sparse(row_sz,
                                                    max_sz,
                                                    col_sz, false, false);
    paddle_real* array;
    int colBuf[col_sz];
    int rowBuf[row_sz+1];
    float valueArray[col_sz];
    rowBuf[0] = 0;
    for (int i = 0; i< col_sz; ++i){
        colBuf[i] = i;
        valueArray[i] = 0.0;
    }
    // map string feature to id feature
    int cnt = 0;
    for (auto f_iter = ffeat.begin(); f_iter != ffeat.end(); ++f_iter) {
        if (mapping.find(f_iter->first) != mapping.end()) {
            colBuf[cnt] = mapping[f_iter->first];
            valueArray[cnt] = f_iter->second;
            MS_LOG_DEBUG("zwj_debug_sparse_f:[%s][%d][%f]", f_iter->first.c_str(),
                         colBuf[cnt], valueArray[cnt]);
        } else {
            MS_LOG_DEBUG("feat not found id [%s]", f_iter->first.c_str());
        }
        cnt++;
    }
    if (cnt == 0) {
        ++cnt;
    }
    rowBuf[1] = rowBuf[0] + cnt;
    paddle_matrix_sparse_copy_from(mat,
                                   rowBuf,
                                   row_sz,
                                   colBuf,
                                   col_sz,
                                   valueArray,
                                   col_sz);
    int err_code = paddle_arguments_set_value(in_args, slot_idx, mat);
    if (err_code != kPD_NO_ERROR) {
        MS_LOG_DEBUG("_set_sparse_f_slot error: [%d]", err_code);
        return -2;
    }
    MS_LOG_DEBUG("dump slot_index[%d],size[%d]", slot_idx, col_sz);
    if (ffeat.size() == 0){
        return -1;
    }
    return 0;
}

jacquesqiao commented 6 years ago

能贴一下你的模型配置么

zhangweijiqn commented 6 years ago

出问题的是emb_uipf特征，离线代码如下：

uipf = paddle.layer.data("uipf_slot",
            paddle.data_type.sparse_vector(conf['input']['uipf_sz']))
emb_uipf = paddle.layer.fc(name='emb_uipf', input = uipf, size = conf['uipf']['sz'],
            act = paddle.activation.Linear())
'''Concat Layers'''
    concat = paddle.layer.concat(name = "concat",
                input = [emb_ubu, emb_ubg, emb_ubj, emb_ubt, emb_uba,
                    emb_uipf, emb_uirf, emb_uisd, emb_urpf, emb_usfr,
                    emb_ch, emb_cw, emb_co, emb_cn,
                    emb_mftw, emb_msw,
                    emb_w0mftw, emb_w0msw, emb_w1mftw, emb_w1msw])

    ''' Stage 1: Full Connected Layers '''
    fc1_1 = paddle.layer.fc(name='fc1_1', input = concat, size = conf['fc1_1']['sz'],
            act = paddle.activation.Linear())
...

jacquesqiao commented 6 years ago

uipf 这个input是怎么来的？

zhangweijiqn commented 6 years ago

上面代码中已贴。

zhangweijiqn commented 6 years ago

问题已经解决，是生成的bin配置中slot顺序和在线转化的代码中的顺序不一致所致，sequence类型的slot传到了sparse_vector上面，怀疑是判断valueArr时为null。

PaddlePaddle / Paddle

The input of 'fc' layer must be matrix #2889