aws-neuron / aws-neuron-samples

Example code for AWS Neuron SDK developers building inference and training applications
Other
101 stars 32 forks source link

Unable to trace SDXL VAE decoder with a different dimension #34

Open neo opened 9 months ago

neo commented 9 months ago

In torch-neuronx/inference/hf_pretrained_sdxl_1024_inference.ipynb, I tried to change [1, 4, 128, 128] to [1, 4, 104, 152] and it didn't work; more specifically I was able to trace the unet and post_quant_conv with such shape but not with the decoder.

Here's the error I got:

2023-09-08T21:17:33Z Too many instructions after unroll for function sg0000 !
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File <timed exec>:10

File /opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuronx/xla_impl/trace.py:323, in trace(func, example_inputs, states, input_output_aliases, compiler_workdir, compiler_args, options)
    320     compiler_workdir = context.name
    322 with context:
--> 323     neff_filename, metaneff, flattener, packer = _trace(
    324         func,
    325         example_inputs,
    326         states,
    327         input_output_aliases,
    328         compiler_workdir,
    329         compiler_args,
    330         options,
    331     )
    332     return create_neuron_model(
    333         neff_filename,
    334         metaneff,
   (...)
    338         input_output_aliases,
    339     )

File /opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuronx/xla_impl/trace.py:416, in _trace(func, example_inputs, states, input_output_aliases, compiler_workdir, compiler_args, options)
    413     handle.write(hlo.SerializeToString())
    415 # Compile HLO to NEFF
--> 416 neff_filename = hlo_compile(model_dir, compiler_workdir, compiler_args)
    418 metaneff = hlo_metaneff(hlo, input_parameter_names, updated_input_output_aliases)
    420 return neff_filename, metaneff.SerializeToString(), flattener, packer

File /opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/torch_neuronx/xla_impl/trace.py:281, in hlo_compile(filename, compiler_workdir, compiler_args)
    274     elif status == -11:
    275         logger.warning(
    276             "The neuronx-cc (neuron compiler) crashed (SEGFAULT). "
    277             "This is likely due to a bug in the compiler.  "
    278             "Please lodge an issue at 'https://github.com/aws/aws-neuron-sdk/issues'"
    279         )
--> 281     raise RuntimeError(f"neuronx-cc failed with {status}")
    283 return neff_filename

RuntimeError: neuronx-cc failed with 70
And the text print out before the error: ``` 2023-09-08T21:17:23Z Running DoNothing 2023-09-08T21:17:23Z DoNothing finished after 0.000 seconds 2023-09-08T21:17:23Z Running CanonicalizeIR 2023-09-08T21:17:23Z CanonicalizeIR finished after 0.018 seconds 2023-09-08T21:17:23Z Running ExpandBatchNorm 2023-09-08T21:17:23Z ExpandBatchNorm finished after 0.057 seconds 2023-09-08T21:17:23Z Running ResolveComplicatePredicates 2023-09-08T21:17:23Z ResolveComplicatePredicates finished after 0.017 seconds 2023-09-08T21:17:23Z Running AffinePredicateResolution 2023-09-08T21:17:23Z AffinePredicateResolution finished after 0.019 seconds 2023-09-08T21:17:23Z Running EliminateDivs 2023-09-08T21:17:23Z EliminateDivs finished after 0.018 seconds 2023-09-08T21:17:23Z Running PerfectLoopNest 2023-09-08T21:17:23Z PerfectLoopNest finished after 0.016 seconds 2023-09-08T21:17:23Z Running Simplifier 2023-09-08T21:17:24Z Simplifier finished after 0.223 seconds 2023-09-08T21:17:24Z Running GenericAccessSimplifier 2023-09-08T21:17:24Z GenericAccessSimplifier finished after 0.015 seconds 2023-09-08T21:17:24Z Running TCTransform 2023-09-08T21:17:24Z TCTransform finished after 0.027 seconds 2023-09-08T21:17:24Z Running CommuteConcat 2023-09-08T21:17:24Z CommuteConcat finished after 0.016 seconds 2023-09-08T21:17:24Z Running TensorOpFusion 2023-09-08T21:17:24Z TensorOpFusion finished after 0.018 seconds 2023-09-08T21:17:24Z Running TensorOpTransform 2023-09-08T21:17:24Z TensorOpTransform finished after 0.060 seconds 2023-09-08T21:17:24Z Running LowerTensorOp 2023-09-08T21:17:24Z LowerTensorOp finished after 0.017 seconds 2023-09-08T21:17:24Z Running MemcpyElimination 2023-09-08T21:17:25Z MemcpyElimination finished after 1.058 seconds 2023-09-08T21:17:25Z Running LoopFusion 2023-09-08T21:17:26Z LoopFusion finished after 1.182 seconds 2023-09-08T21:17:26Z Running Simplifier 2023-09-08T21:17:26Z Simplifier finished after 0.112 seconds 2023-09-08T21:17:26Z Running Delinearization 2023-09-08T21:17:26Z Delinearization finished after 0.052 seconds 2023-09-08T21:17:26Z Running DeadStoreElimination 2023-09-08T21:17:28Z DeadStoreElimination finished after 1.288 seconds 2023-09-08T21:17:28Z Running Simplifier 2023-09-08T21:17:28Z Simplifier finished after 0.116 seconds 2023-09-08T21:17:28Z Running LICM 2023-09-08T21:17:28Z LICM finished after 0.064 seconds 2023-09-08T21:17:28Z Running Delinearization 2023-09-08T21:17:28Z Delinearization finished after 0.019 seconds 2023-09-08T21:17:28Z Running LoopFusion 2023-09-08T21:17:28Z LoopFusion finished after 0.224 seconds 2023-09-08T21:17:28Z Running SimplifySlice 2023-09-08T21:17:28Z SimplifySlice finished after 0.007 seconds 2023-09-08T21:17:28Z Running LICM 2023-09-08T21:17:28Z LICM finished after 0.019 seconds 2023-09-08T21:17:28Z Running Simplifier 2023-09-08T21:17:28Z Simplifier finished after 0.114 seconds 2023-09-08T21:17:28Z Running ValueNumbering 2023-09-08T21:17:28Z ValueNumbering finished after 0.036 seconds 2023-09-08T21:17:28Z Running LICM 2023-09-08T21:17:28Z LICM finished after 0.018 seconds 2023-09-08T21:17:28Z Running PadElimination 2023-09-08T21:17:28Z PadElimination finished after 0.001 seconds 2023-09-08T21:17:28Z Running Delinearization 2023-09-08T21:17:28Z Delinearization finished after 0.058 seconds 2023-09-08T21:17:28Z Running LoopFusion 2023-09-08T21:17:29Z LoopFusion finished after 0.218 seconds 2023-09-08T21:17:29Z Running GenericAccessSimplifier 2023-09-08T21:17:29Z GenericAccessSimplifier finished after 0.007 seconds 2023-09-08T21:17:29Z Running Simplifier 2023-09-08T21:17:29Z Simplifier finished after 0.111 seconds 2023-09-08T21:17:29Z Running LICM 2023-09-08T21:17:29Z LICM finished after 0.018 seconds 2023-09-08T21:17:29Z Running ValueNumbering 2023-09-08T21:17:29Z ValueNumbering finished after 0.024 seconds 2023-09-08T21:17:29Z Running TCTransform 2023-09-08T21:17:29Z TCTransform finished after 0.010 seconds 2023-09-08T21:17:29Z Running CommuteConcat 2023-09-08T21:17:29Z CommuteConcat finished after 0.008 seconds 2023-09-08T21:17:29Z Running RecognizeOpIdiom 2023-09-08T21:17:29Z RecognizeOpIdiom finished after 0.047 seconds 2023-09-08T21:17:29Z Running MaskPropagation 2023-09-08T21:17:29Z MaskPropagation finished after 0.023 seconds 2023-09-08T21:17:29Z Running Recompute 2023-09-08T21:17:29Z Recompute finished after 0.001 seconds 2023-09-08T21:17:29Z Running DeadCodeElimination 2023-09-08T21:17:29Z DeadCodeElimination finished after 0.008 seconds 2023-09-08T21:17:29Z Running DoNothing 2023-09-08T21:17:29Z DoNothing finished after 0.000 seconds 2023-09-08T21:17:29Z Running MutateDataType 2023-09-08T21:17:29Z MutateDataType finished after 0.006 seconds 2023-09-08T21:17:29Z Running AutoCastTCInputs 2023-09-08T21:17:29Z AutoCastTCInputs finished after 0.015 seconds 2023-09-08T21:17:29Z Running GenericAccessSimplifier 2023-09-08T21:17:29Z GenericAccessSimplifier finished after 0.009 seconds 2023-09-08T21:17:29Z Running Simplifier 2023-09-08T21:17:29Z Simplifier finished after 0.114 seconds 2023-09-08T21:17:29Z Running LegalizeCCOpLayout 2023-09-08T21:17:29Z LegalizeCCOpLayout finished after 0.008 seconds 2023-09-08T21:17:29Z Running DelinearIndices 2023-09-08T21:17:29Z DelinearIndices finished after 0.018 seconds 2023-09-08T21:17:29Z Running Delinearization 2023-09-08T21:17:29Z Delinearization finished after 0.017 seconds 2023-09-08T21:17:29Z Running DelinearIndices 2023-09-08T21:17:29Z DelinearIndices finished after 0.018 seconds 2023-09-08T21:17:29Z Running DeadCodeElimination 2023-09-08T21:17:29Z DeadCodeElimination finished after 0.008 seconds 2023-09-08T21:17:29Z Running InferIntrinsicOnCC 2023-09-08T21:17:29Z InferIntrinsicOnCC finished after 0.099 seconds 2023-09-08T21:17:29Z Running ResolveAccessConflict 2023-09-08T21:17:29Z ResolveAccessConflict finished after 0.065 seconds 2023-09-08T21:17:29Z Running LICM 2023-09-08T21:17:29Z LICM finished after 0.056 seconds 2023-09-08T21:17:29Z Running LocalLayoutOpt 2023-09-08T21:17:29Z LocalLayoutOpt finished after 0.053 seconds 2023-09-08T21:17:29Z Running DelinearIndices 2023-09-08T21:17:29Z DelinearIndices finished after 0.019 seconds 2023-09-08T21:17:29Z Running OrigLayoutTilingPipeline 2023-09-08T21:17:29Z Running GlobalLayoutOpt 2023-09-08T21:17:31Z GlobalLayoutOpt finished after 1.704 seconds 2023-09-08T21:17:31Z Running CanonicalizeDAG 2023-09-08T21:17:31Z CanonicalizeDAG finished after 0.082 seconds 2023-09-08T21:17:31Z Running FlattenAxesForTiling 2023-09-08T21:17:31Z FlattenAxesForTiling finished after 0.075 seconds 2023-09-08T21:17:31Z Running SundaSizeTiling 2023-09-08T21:17:33Z SundaSizeTiling finished after 1.930 seconds 2023-09-08T21:17:33Z OrigLayoutTilingPipeline finished after 3.809 seconds 2023-09-08T21:17:33Z Running TilingProfiler 2023-09-08T21:17:33Z TilingProfiler finished after 0.094 seconds 2023-09-08T21:17:33Z 2023-09-08T21:17:33Z Diagnostic information: 2023-09-08T21:17:33Z NeuronX Compiler version 2.9.0.40+07376825f 2023-09-08T21:17:33Z 2023-09-08T21:17:33Z Python version 3.8.10 2023-09-08T21:17:33Z HWM version 2.9.0.2-f79d59e7b 2023-09-08T21:17:33Z NumPy version 1.21.6 2023-09-08T21:17:33Z 2023-09-08T21:17:33Z Running on AMI ami-0d08bfe808787640a 2023-09-08T21:17:33Z Running in region use1-az5 2023-09-08T21:17:33Z 2023-09-08T21:17:33Z Diagnostic logs stored in /home/ubuntu/log-neuron-cc.txt ```
Lastly the log-neuron-cc.txt: ``` 2023-09-08T21:17:22Z INFO 238269 [root]: /opt/aws_neuron_venv_pytorch/bin/neuronx-cc compile sdxl_compile_dir_832x1216/vae_decoder/model --framework XLA --target trn1 --output sdxl_compile_dir_832x1216/vae_decoder/graph.neff 2023-09-08T21:17:22Z INFO 238334 [root]: TVM/Relay detected 2023-09-08T21:17:22Z INFO 238334 [root]: Pipeline: Frontend HHChecker WalrusDriver BIRLinker Kelper 2023-09-08T21:17:22Z INFO 238334 [root]: Intermediate files stored in /home/ubuntu/neuronxcc-5l2tcm31, output in /home/ubuntu 2023-09-08T21:17:22Z INFO 238334 [pipeline.Pipeline.0]: Job Pipeline len(in_states) 1 2023-09-08T21:17:22Z INFO 238334 [pipeline.Pipeline.0]: Processing input #0 2023-09-08T21:17:22Z INFO 238334 [pipeline.Pipeline.0]: Running pipeline Pipeline.0 2023-09-08T21:17:22Z INFO 238334 [pipeline.Pipeline.0]: Starting job job.Frontend.0 2023-09-08T21:17:22Z INFO 238334 [job.Frontend.0]: Job Frontend len(in_states) 1 2023-09-08T21:17:22Z INFO 238334 [job.Frontend.0]: Processing input #0 2023-09-08T21:17:22Z INFO 238334 [job.Frontend.0]: Start model loading 2023-09-08T21:17:22Z INFO 238334 [job.Frontend.0]: IR signature: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 for model 2023-09-08T21:17:22Z INFO 238334 [job.Frontend.0]: Executing: /opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/neuronxcc/starfish/bin/hlo2penguin --input /home/ubuntu/sdxl_compile_dir_832x1216/vae_decoder/model --out-dir ./ --output penguin.py --layers-per-module=1 --coalesce-all-gathers=false --coalesce-reduce-scatters=false --coalesce-all-reduces=false --emit-tensor-level-dropout-ops --emit-tensor-level-rng-ops 2023-09-08T21:17:22Z INFO 238334 [job.Frontend.0]: Histogram before graph level optimizations: total HLO instructions: 1614 broadcast 452 28.00% ################################################################ reshape 364 22.55% ################################################### constant 294 18.22% ######################################### multiply 167 10.35% ####################### add 113 7.00% ################ transpose 57 3.53% ######## convolution 35 2.17% #### batch-norm-training 30 1.86% #### get-tuple-element 30 1.86% #### tanh 29 1.80% #### divide 16 0.99% ## call 15 0.93% ## dot 6 0.37% reduce 2 0.12% exponential 1 0.06% parameter 1 0.06% subtract 1 0.06% tuple 1 0.06% Histogram before graph level optimizations: total HLO instructions: 1614 broadcast 452 28.00% ################################################################ reshape 364 22.55% ################################################### constant 294 18.22% ######################################### multiply 167 10.35% ####################### add 113 7.00% ################ transpose 57 3.53% ######## convolution 35 2.17% #### batch-norm-training 30 1.86% #### get-tuple-element 30 1.86% #### tanh 29 1.80% #### divide 16 0.99% ## call 15 0.93% ## dot 6 0.37% reduce 2 0.12% exponential 1 0.06% parameter 1 0.06% subtract 1 0.06% tuple 1 0.06% INFO: IoStatistics: total inputs: 1 INFO: IoStatistics: total outputs: 1 INFO: IoStatistics: total passthrough tensors: 0 INFO: IoStatistics: total outputs read from: 0 INFO: IoStatistics: total redundant outputs: 0 Replaced 0 dropout sequences with OffloadedDropout INFO: HloMacCount has found 5025528358400 INFO: Traffic has found 12393472 INFO: AIF 810996.04 Histogram after graph level optimizations: total HLO instructions: 758 constant 143 18.87% ################################################################ multiply 118 15.57% #################################################### add 113 14.91% ################################################## broadcast 110 14.51% ################################################# reshape 73 9.63% ################################ transpose 49 6.46% ##################### convolution 35 4.62% ############### batch-norm-training 30 3.96% ############# get-tuple-element 30 3.96% ############# tanh 29 3.83% ############ custom-call 15 1.98% ###### dot 6 0.79% ## reduce 2 0.26% exponential 1 0.13% parameter 1 0.13% divide 1 0.13% subtract 1 0.13% tuple 1 0.13% HLO Ops used in computation: add batch-norm-training broadcast constant convolution custom-call divide dot exponential get-tuple-element multiply parameter reduce reshape subtract tanh transpose tuple Invoking RemoveOptimizationBarriers pass Invoking NeuronInstCombine pass. Total SqrtMul sequences deleted = 0 2023-09-08T21:17:22Z INFO 238334 [job.Frontend.0]: Start tensorization 2023-09-08T21:17:22Z WARNING 238334 [job.Frontend.0]: TVM not detected. 2023-09-08T21:17:23Z INFO 238334 [job.Frontend.0]: Num parallel jobs: 1 2023-09-08T21:17:23Z INFO 238334 [root/Tensorizer/All]: Enter time region 2023-09-08T21:17:23Z INFO 238334 [Tensorizer]: Frontend found a single CU. Switching to flat flow. 2023-09-08T21:17:23Z INFO 238334 [Tensorizer]: Building model from Penguin script "penguin.py"... 2023-09-08T21:17:23Z INFO 238334 [Tensorizer]: Tensorizer options: --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=matmult-bf16 --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --sunda-batchnorm --enable-tritium-loopfusion --keep-remat-dma-transpose --enable-softmax-kernel 2023-09-08T21:17:23Z INFO 238334 [Tensorizer]: Building model from Penguin script "penguin.py"... 2023-09-08T21:17:23Z INFO 238334 [Tensorizer]: Successfully built model. 2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/DoNothing]: Running DoNothing 2023-09-08T21:17:23Z INFO 238334 [DoNothing]: Finished (changed=True) 2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds 2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR 2023-09-08T21:17:23Z INFO 238334 [CanonicalizeIR]: Finished (changed=True) 2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.018 seconds 2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm 2023-09-08T21:17:23Z INFO 238334 [ExpandBatchNorm]: Finished (changed=True) 2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.057 seconds 2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates 2023-09-08T21:17:23Z INFO 238334 [ResolveComplicatePredicates]: Finished (changed=False) 2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.017 seconds 2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution 2023-09-08T21:17:23Z INFO 238334 [AffinePredicateResolution]: Finished (changed=False) 2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.019 seconds 2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs 2023-09-08T21:17:23Z INFO 238334 [EliminateDivs]: Finished (changed=False) 2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.018 seconds 2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest 2023-09-08T21:17:23Z INFO 238334 [PerfectLoopNest]: Finished (changed=False) 2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.016 seconds 2023-09-08T21:17:23Z USER 238334 [sg0000/Tensorizer/Simplifier]: Running Simplifier 2023-09-08T21:17:24Z INFO 238334 [Simplifier]: Finished (changed=True) 2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.223 seconds 2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier 2023-09-08T21:17:24Z INFO 238334 [GenericAccessSimplifier]: Finished (changed=False) 2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.015 seconds 2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/TCTransform]: Running TCTransform 2023-09-08T21:17:24Z INFO 238334 [TCTransform]: Finished (changed=True) 2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.027 seconds 2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat 2023-09-08T21:17:24Z INFO 238334 [CommuteConcat]: Finished (changed=False) 2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.016 seconds 2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/TensorOpFusion]: Running TensorOpFusion 2023-09-08T21:17:24Z INFO 238334 [TensorOpFusion]: Finished (changed=True) 2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/TensorOpFusion]: TensorOpFusion finished after 0.018 seconds 2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/TensorOpTransform]: Running TensorOpTransform 2023-09-08T21:17:24Z INFO 238334 [TensorOpTransform]: Finished (changed=True) 2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.060 seconds 2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/LowerTensorOp]: Running LowerTensorOp 2023-09-08T21:17:24Z INFO 238334 [LowerTensorOp]: Finished (changed=True) 2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.017 seconds 2023-09-08T21:17:24Z USER 238334 [sg0000/Tensorizer/MemcpyElimination]: Running MemcpyElimination 2023-09-08T21:17:25Z INFO 238334 [MemcpyElimination]: Finished (changed=True) 2023-09-08T21:17:25Z USER 238334 [sg0000/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 1.058 seconds 2023-09-08T21:17:25Z USER 238334 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion 2023-09-08T21:17:26Z INFO 238334 [LoopFusion]: Finished (changed=True) 2023-09-08T21:17:26Z USER 238334 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 1.182 seconds 2023-09-08T21:17:26Z USER 238334 [sg0000/Tensorizer/Simplifier]: Running Simplifier 2023-09-08T21:17:26Z INFO 238334 [Simplifier]: Finished (changed=False) 2023-09-08T21:17:26Z USER 238334 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.112 seconds 2023-09-08T21:17:26Z USER 238334 [sg0000/Tensorizer/Delinearization]: Running Delinearization 2023-09-08T21:17:26Z INFO 238334 [Delinearization]: Finished (changed=True) 2023-09-08T21:17:26Z USER 238334 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.052 seconds 2023-09-08T21:17:26Z USER 238334 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination 2023-09-08T21:17:28Z INFO 238334 [DeadStoreElimination]: Finished (changed=False) 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 1.288 seconds 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Simplifier]: Running Simplifier 2023-09-08T21:17:28Z INFO 238334 [Simplifier]: Finished (changed=False) 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.116 seconds 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LICM]: Running LICM 2023-09-08T21:17:28Z INFO 238334 [LICM]: Finished (changed=True) 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LICM]: LICM finished after 0.064 seconds 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Delinearization]: Running Delinearization 2023-09-08T21:17:28Z INFO 238334 [Delinearization]: Finished (changed=False) 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.019 seconds 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion 2023-09-08T21:17:28Z INFO 238334 [LoopFusion]: Finished (changed=False) 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.224 seconds 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/SimplifySlice]: Running SimplifySlice 2023-09-08T21:17:28Z INFO 238334 [SimplifySlice]: Finished (changed=False) 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.007 seconds 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LICM]: Running LICM 2023-09-08T21:17:28Z INFO 238334 [LICM]: Finished (changed=False) 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LICM]: LICM finished after 0.019 seconds 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Simplifier]: Running Simplifier 2023-09-08T21:17:28Z INFO 238334 [Simplifier]: Finished (changed=False) 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.114 seconds 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering 2023-09-08T21:17:28Z INFO 238334 [ValueNumbering]: Finished (changed=True) 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.036 seconds 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LICM]: Running LICM 2023-09-08T21:17:28Z INFO 238334 [LICM]: Finished (changed=False) 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LICM]: LICM finished after 0.018 seconds 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/PadElimination]: Running PadElimination 2023-09-08T21:17:28Z INFO 238334 [PadElimination]: Finished (changed=False) 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/PadElimination]: PadElimination finished after 0.001 seconds 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Delinearization]: Running Delinearization 2023-09-08T21:17:28Z INFO 238334 [Delinearization]: Finished (changed=False) 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.058 seconds 2023-09-08T21:17:28Z USER 238334 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion 2023-09-08T21:17:29Z INFO 238334 [LoopFusion]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.218 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier 2023-09-08T21:17:29Z INFO 238334 [GenericAccessSimplifier]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.007 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Simplifier]: Running Simplifier 2023-09-08T21:17:29Z INFO 238334 [Simplifier]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.111 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LICM]: Running LICM 2023-09-08T21:17:29Z INFO 238334 [LICM]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LICM]: LICM finished after 0.018 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering 2023-09-08T21:17:29Z INFO 238334 [ValueNumbering]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.024 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/TCTransform]: Running TCTransform 2023-09-08T21:17:29Z INFO 238334 [TCTransform]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.010 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat 2023-09-08T21:17:29Z INFO 238334 [CommuteConcat]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.008 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom 2023-09-08T21:17:29Z INFO 238334 [RecognizeOpIdiom]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.047 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation 2023-09-08T21:17:29Z INFO 238334 [MaskPropagation]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.023 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Recompute]: Running Recompute 2023-09-08T21:17:29Z INFO 238334 [Recompute]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Recompute]: Recompute finished after 0.001 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination 2023-09-08T21:17:29Z INFO 238334 [DeadCodeElimination]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.008 seconds 2023-09-08T21:17:29Z INFO 238334 [Tensorizer]: After optimization: 138 statements 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DoNothing]: Running DoNothing 2023-09-08T21:17:29Z INFO 238334 [DoNothing]: Finished (changed=True) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/MutateDataType]: Running MutateDataType 2023-09-08T21:17:29Z INFO 238334 [MutateDataType]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/MutateDataType]: MutateDataType finished after 0.006 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/AutoCastTCInputs]: Running AutoCastTCInputs 2023-09-08T21:17:29Z INFO 238334 [AutoCastTCInputs]: Finished (changed=True) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/AutoCastTCInputs]: AutoCastTCInputs finished after 0.015 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier 2023-09-08T21:17:29Z INFO 238334 [GenericAccessSimplifier]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.009 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Simplifier]: Running Simplifier 2023-09-08T21:17:29Z INFO 238334 [Simplifier]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.114 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout 2023-09-08T21:17:29Z INFO 238334 [LegalizeCCOpLayout]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.008 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices 2023-09-08T21:17:29Z INFO 238334 [DelinearIndices]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.018 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Delinearization]: Running Delinearization 2023-09-08T21:17:29Z INFO 238334 [Delinearization]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.017 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices 2023-09-08T21:17:29Z INFO 238334 [DelinearIndices]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.018 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination 2023-09-08T21:17:29Z INFO 238334 [DeadCodeElimination]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.008 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC 2023-09-08T21:17:29Z INFO 238334 [InferIntrinsicOnCC]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.099 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict 2023-09-08T21:17:29Z INFO 238334 [ResolveAccessConflict]: Finished (changed=True) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.065 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LICM]: Running LICM 2023-09-08T21:17:29Z INFO 238334 [LICM]: Finished (changed=True) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LICM]: LICM finished after 0.056 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt 2023-09-08T21:17:29Z INFO 238334 [LocalLayoutOpt]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.053 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices 2023-09-08T21:17:29Z INFO 238334 [DelinearIndices]: Finished (changed=False) 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.019 seconds 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/OrigLayoutTilingPipeline]: Running OrigLayoutTilingPipeline 2023-09-08T21:17:29Z USER 238334 [sg0000/Tensorizer/GlobalLayoutOpt]: Running GlobalLayoutOpt 2023-09-08T21:17:31Z INFO 238334 [GlobalLayoutOpt]: Finished (changed=True) 2023-09-08T21:17:31Z USER 238334 [sg0000/Tensorizer/GlobalLayoutOpt]: GlobalLayoutOpt finished after 1.704 seconds 2023-09-08T21:17:31Z USER 238334 [sg0000/Tensorizer/CanonicalizeDAG]: Running CanonicalizeDAG 2023-09-08T21:17:31Z INFO 238334 [CanonicalizeDAG]: Finished (changed=True) 2023-09-08T21:17:31Z USER 238334 [sg0000/Tensorizer/CanonicalizeDAG]: CanonicalizeDAG finished after 0.082 seconds 2023-09-08T21:17:31Z USER 238334 [sg0000/Tensorizer/FlattenAxesForTiling]: Running FlattenAxesForTiling 2023-09-08T21:17:31Z INFO 238334 [FlattenAxesForTiling]: Finished (changed=True) 2023-09-08T21:17:31Z USER 238334 [sg0000/Tensorizer/FlattenAxesForTiling]: FlattenAxesForTiling finished after 0.075 seconds 2023-09-08T21:17:31Z USER 238334 [sg0000/Tensorizer/SundaSizeTiling]: Running SundaSizeTiling ```
jeffhataws commented 9 months ago

Hi @neo , thank you for raising the issue. We are aware of compilation issues with different input shapes and are working to fix them in an upcoming release.

neo commented 9 months ago

Not sure how related it is, but I also tried to do it with stabilityai/stable-diffusion-xl-refiner-1.0 and it consistently breaks at tracing the unet (almost right after starting, before the log txt file is even created), and there was no error message, just that the kernel has died...

neo commented 9 months ago

and while I have you, one semi-related question I had was that, I saw previous SD samples all have the step compiling also the text encoder, but not with this SDXL example – is it considered no longer needed anymore? or is it just not included because we haven't get there yet?

aws-mvaria commented 7 months ago

Apologies for the late reply on your latest question - it was not included because we hadn't gotten there yet. That said, in an upcoming release, we'll be tracing it in our samples for improved performance.

neo commented 7 months ago

Thank you for the response!

I chatted with one of the neuron team members earlier and got the suggestion to use neuron-optimum from HF which does tracing on every component and has been working quite well for us 😊

However it would still be great to see the samples doing it on a lower level so ppl can learn what's going on under the hood.

neo commented 7 months ago

Can I add that when making the sample for the SDXL text encoder, can we provide examples around doing attention_mask, output_hidden_states and return_dict? Because they're all required and set explicitly by compel: https://github.com/damian0815/compel/blob/v2.0.2/src/compel/embeddings_provider.py#L390-L393

Thanks!