Some questions wanna disscuss

xqdan commented 4 years ago

Hi,

Nice work! after went thru the code, some questions want to ask,

How to lower ops like batchnorm to meta ops, tf xla has tf2xla phase to lower big op into a group of meta ops, I don't see the related code in jittor for this, do I miss anything here?
How many ops is using meta ops, how many is supported by extern library? take restnet50 as example.
How to do auto schedule for ops like conv, gemm, could you elaborate with a specfic case?

Thanks!

Gword commented 4 years ago

Thanks for your feedback.

Jittor's op are all composed of meta op, you can see their implementation method in nn.py. The implementation of batchnorm can refer to the class BatchNorm in nn.py.
In resnet50, only conv op uses cudnn for calculation.
Conv, gemm and other algorithm hardware libraries have been optimized very well, we will forward the calculation process. And unsupported op will use some general optimization strategies, such as remove atomic.

xqdan commented 4 years ago

Thanks, make sense. more questions,

notice that for registered ops, only infer shape is provided, do we need infer type? do you consider employ a type system to save infer and optimize graph optimizations?
How to do auto schedule for unsupported op, I mean auto schedule is none trival, I wanna know the solution jittor is using.
about vectorization, seems the capability of traditional compiler is limited, is it enough to only emit pragma for LLVM?

Thanks

xmyqsh commented 4 years ago

@xqdan shape could be inferred like this y = create_output(nullptr, x->dtype()); .

@Gword Where has been set resnet50's conv op to use cudnn? I can only refer to the nn.py implementation. How does the extern implementation to be selected?

jackmsye commented 4 years ago

@xmyqsh same confused as me. i saw some example and tests in python/jittor /test/test_cudnn_op.py, but I can not find how they triggle cudnn, the only flag in use_cuda=1

xmyqsh commented 4 years ago

@jackmsye I have got it!

267         string relay_conv_name = fop->flags.get(NodeFlags::_cpu) ?
268             "mkl_conv" : "cudnn_conv";

jackmsye commented 4 years ago

@xmyqsh thx, they just put on conv_tunner

xmyqsh commented 4 years ago

Some tuners use TunerManager, some tuners use PassManager, while others use both. Do you know why?

jackmsye commented 4 years ago

first they use jit compiler, when it is ops, it executes tunner_manager, you can see the code in src/ops_compiler.cc, then in TunnerManager, the function run_tunner's member is PassManager. see code in tuner_manager.cc

xmyqsh commented 4 years ago

@jackmsye Exactly! Can you give a brief summary of TunnerManager and PassManager?

    run_tuner<ReorderTuner>(&pm);
    run_tuner<BroadcastTuner>(&pm);
    run_tuner<ReduceTuner>(&pm);
    run_tuner<MatmulTuner>(&pm);
    run_tuner<ConvTuner>(&pm);

    run_pass<MarkRawPass>();
    run_pass<ReplaceForNumPass>();
    run_pass<LoopVarAnalyzePass>();
    run_pass<RemoveLoopPass>();
    run_pass<RenameLoopIndexPass>();
    run_pass<CompileShapesPass>();
....
    run_pass<SplitLoopPass>();
    run_pass<ReorderLoopPass>();
    run_pass<MergeLoopPass>();
    run_pass<ExpandEmptyBlockPass>();
    run_pass<SolveConflictDefinePass>();

    run_pass<RemoveIntermediatePass>();
....
    run_pass<SolveConflictDefinePass>();

    run_pass<RestridePass>();
....
    if (cc_type == "icc") {
        // only icc supports pragma
        run_pass<VectorizePass>();
        run_pass<UnrollPass>();
        run_pass<UnrollPass>();
    }
    run_pass<UseMovntPass>();
    run_pass<CheckCachePass>();
    run_pass<LoopToFuncPass>();
    run_pass<AssumeAlignedPass>();
    run_pass<ParallelPass>();
    run_pass<AtomicTunerPass>();
    run_pass<FloatAtomicFixPass>();
....
    run_pass<InsertProfileLoopPass>();
....
    run_pass<SolveConflictDefinePass>();
....
    run_pass<FakeMainPass>();

Jittor commented 4 years ago

Thanks @xmyqsh , we are polishing our backend documents and will be released soon~

Jittor / jittor

Some questions wanna disscuss #24