Jittor / jittor

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
https://cg.cs.tsinghua.edu.cn/jittor/
Apache License 2.0
3.08k stars 311 forks source link

Unable to get repr for <class 'jittor.jittor_core.Var'> #446

Closed ader47 closed 6 months ago

ader47 commented 1 year ago

模型参数为float16的时候,输出会出现

Unable to get repr for <class 'jittor.jittor_core.Var'> 错误

Full Log

出现该错误之后,无法进行后续计算,后续计算错误log如下:

step 0, loss = 0.35731029510498047
Traceback (most recent call last):
  File "/home/****/pycharm/plugins/python/helpers/pydev/pydevd.py", line 1496, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/****/pycharm/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/**************/main.py", line 44, in <module>
    optim.step (loss_mean)
  File "/home/*****/lib/python3.8/site-packages/jittor/optim.py", line 305, in step
    self.pre_step(loss, retain_graph=False)
  File "/home/******/lib/python3.8/site-packages/jittor/optim.py", line 220, in pre_step
    self.backward(loss, retain_graph)
  File "/home/*******/lib/python3.8/site-packages/jittor/optim.py", line 170, in backward
    jt.sync(params_has_grad)
RuntimeError: Wrong inputs arguments, Please refer to examples(help(jt.sync)).

Types of your inputs are:
 self   = module,
 args   = (list, ),

The function declarations are:
 void sync(const vector<VarHolder*>& vh=vector<VarHolder*>(), bool device_sync=false, bool weak_sync=true)

Failed reason:[f 0601 21:10:21.606758 04 parallel_compiler.cc:330] Error happend during compilation:
 [Error] source file location:/home/*****/.cache/jittor/jt1.3.7/g++11.3.0/py3.8.16/Linux-5.19.0-4xd5/AMDRyzen76800Hx3a/default/cu12.1.105_sm_86/jit/__opkey0_broadcast_to__Tx_float16__DIM_3__BCAST_2__opkey1_broadcast_to__Tx_float32__DIM_3____hash_6479ac4efca2eda2_op.cc
Compile fused operator(12/16)failed:[Op(100:1:1:1:i1:o1:s0,broadcast_to->101),Op(199:0:1:1:i1:o1:s0,broadcast_to->200),Op(203:1:1:1:i2:o1:s0,binary.multiply->204),Op(205:0:1:1:i1:o1:s0,reduce.add->206),]

Reason: [f 0601 21:10:21.574365 12:C0 mkl_matmul_op.cc:27] Check failed: a->dtype().dsize() == 4 && b->dtype().dsize() == 4  Something wrong... Could you please report this issue?
 support float32 only now.

CondaError: KeyboardInterrupt

Process finished with exit code 1

Minimal Reproduce

代码:

import jittor as jt
import numpy as np
from jittor import nn, Module, init

class Model(Module):
    def __init__(self):
        self.layer1 = nn.Linear(1, 10)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(10, 1)
    def execute (self,x) :
        x = self.layer1(x)
        x = self.relu(x)
        x = self.layer2(x)
        return x

def convert_module_to_f16(l):
    l.weight = l.weight.half()
    if l.bias is not None:
        l.bias= l.bias.half()

if __name__ == '__main__':
    np.random.seed(0)
    jt.set_seed(3)
    n = 1000
    batch_size = 50

    def get_data(n):
        for i in range(n):
            x = np.random.rand(batch_size, 1)
            y = x*x
            yield jt.float32(x), jt.float32(y)

    model = Model()
    model.layer1.apply(convert_module_to_f16)
    model.layer2.apply(convert_module_to_f16)
    learning_rate = 0.1
    optim = nn.SGD (model.parameters(), learning_rate)

    for i,(x,y) in enumerate(get_data(n)):
        pred_y = model(x.half())
        ## 在这里打断点,pred_y 会很大概率会出现 Unable to get repr for <class 'jittor.jittor_core.Var'>, 无法进行后续计算
        loss = jt.sqr(pred_y - y)
        loss_mean = loss.mean()
        optim.step (loss_mean)
        print(f"step {i}, loss = {loss_mean.numpy().sum()}")

    assert loss_mean.numpy() < 0.005
ader47 commented 1 year ago

是只支持float32吗现在?

TimenoLong commented 5 months ago

为什么不回答问题就把问题关闭了?@JittorRepos