Open xqdan opened 4 years ago
Thanks for your feedback.
class BatchNorm
in nn.py. Thanks, make sense. more questions,
notice that for registered ops, only infer shape is provided, do we need infer type? do you consider employ a type system to save infer and optimize graph optimizations?
How to do auto schedule for unsupported op, I mean auto schedule is none trival, I wanna know the solution jittor is using.
about vectorization, seems the capability of traditional compiler is limited, is it enough to only emit pragma for LLVM?
Thanks
@xqdan shape could be inferred like this y = create_output(nullptr, x->dtype());
.
@Gword Where has been set resnet50's conv op to use cudnn? I can only refer to the nn.py implementation. How does the extern
implementation to be selected?
@xmyqsh same confused as me. i saw some example and tests in python/jittor /test/test_cudnn_op.py, but I can not find how they triggle cudnn, the only flag in use_cuda=1
@jackmsye I have got it!
267 string relay_conv_name = fop->flags.get(NodeFlags::_cpu) ?
268 "mkl_conv" : "cudnn_conv";
@xmyqsh thx, they just put on conv_tunner
Some tuners use TunerManager
, some tuners use PassManager
, while others use both. Do you know why?
first they use jit compiler, when it is ops, it executes tunner_manager, you can see the code in src/ops_compiler.cc, then in TunnerManager, the function run_tunner's member is PassManager. see code in tuner_manager.cc
@jackmsye Exactly! Can you give a brief summary of TunnerManager and PassManager?
run_tuner<ReorderTuner>(&pm);
run_tuner<BroadcastTuner>(&pm);
run_tuner<ReduceTuner>(&pm);
run_tuner<MatmulTuner>(&pm);
run_tuner<ConvTuner>(&pm);
run_pass<MarkRawPass>();
run_pass<ReplaceForNumPass>();
run_pass<LoopVarAnalyzePass>();
run_pass<RemoveLoopPass>();
run_pass<RenameLoopIndexPass>();
run_pass<CompileShapesPass>();
....
run_pass<SplitLoopPass>();
run_pass<ReorderLoopPass>();
run_pass<MergeLoopPass>();
run_pass<ExpandEmptyBlockPass>();
run_pass<SolveConflictDefinePass>();
run_pass<RemoveIntermediatePass>();
....
run_pass<SolveConflictDefinePass>();
run_pass<RestridePass>();
....
if (cc_type == "icc") {
// only icc supports pragma
run_pass<VectorizePass>();
run_pass<UnrollPass>();
run_pass<UnrollPass>();
}
run_pass<UseMovntPass>();
run_pass<CheckCachePass>();
run_pass<LoopToFuncPass>();
run_pass<AssumeAlignedPass>();
run_pass<ParallelPass>();
run_pass<AtomicTunerPass>();
run_pass<FloatAtomicFixPass>();
....
run_pass<InsertProfileLoopPass>();
....
run_pass<SolveConflictDefinePass>();
....
run_pass<FakeMainPass>();
Thanks @xmyqsh , we are polishing our backend documents and will be released soon~
Hi,
Nice work! after went thru the code, some questions want to ask,
How to lower ops like batchnorm to meta ops, tf xla has tf2xla phase to lower big op into a group of meta ops, I don't see the related code in jittor for this, do I miss anything here?
How many ops is using meta ops, how many is supported by extern library? take restnet50 as example.
How to do auto schedule for ops like conv, gemm, could you elaborate with a specfic case?
Thanks!