Open eshankn opened 2 months ago
Hi,
First of all, thanks for your detailed explanations and providing the necessary files.
We have been able to reproduce the errors in the scenarios you provided on our end, though we do not have a definite solution at the moment.
Our initial theory for the cause of the problem was the use of the Flatten
layer right after MaxPool1D
without a convolution operation, with the knowledge that flattening cannot be used in the same layer as pooling, as described here. However, in your case, your network has a standalone MaxPool1D
layer, which is allowed even without a fused convolution operation. Nevertheless, as a sanity check, we explicitly added a "fake passthrough" layer after the max pooling, and it indeed did not solve the problem. (Note: The 'add_fake_passthrough.py' utility adds an 'identity layer' after the specified layer of a model checkpoint, to circumvent certain limitations. See example usage here.)
We will continue looking into this problem. In the meanwhile, I'm hoping that you are not stuck in your work, and can deploy your network on the hardware thanks to your workarounds.
@alicangok thank you for providing the update.
At the moment, I am able to deploy my end application network on the hardware. Thanks to the hardware's capabilities, I have been curious about the energy consumption for variations of my network. In addition, I have a few related questions.
self.block1_conv1d_bn_relu_1 = ai8x.FusedConv1dBNReLU(in_channels=1, out_channels=4, kernel_size=2, stride=1, padding=0)
self.block6_flatten = nn.Flatten()
self.block6_dense = ai8x.Linear(in_features=124, out_features=2)
I can measure energy through the PMON and an external device for input data dimensions, 1 x 64
and higher. However, that is not the case for input data dimensions, 1 x 32
and smaller. The serial output is stuck while Measuring input load + inference...
because the CNN_COMPLETE
trigger is not generated. I understand that the PMON measurement repeats each operation 100 times to accumulate enough energy and my initial guess was the low number of network operations. Therefore, I increased the complexity of the network by gradually adding multiple convolution and pooling layers in between but that did not change the result either.
Hello again @eshankn,
Regarding your first question, while there is no direct method to measure the energy consumed by each layer, you may use the --stop-after
argument with consecutive layers and subtract the consumed energies to get an estimate.
As for your second question, I am not aware of such an explicit constraint. However, with very small & fast networks, the hardware may hang if the inference finishes before the main code had a chance to enter sleep mode. I would suggest you to try the --no-wfi
argument to disable sleep mode for your testing. More details are provided here.
P.S. We will let you know once we have better understanding of your earlier issue. We have been continuing our investigation in different scenarios and both MAX78000 and MAX78002 hardware.
@alicangok thank you for your response. Using --no-wfi
does the trick and the inference is completed for the case mentioned!
Regarding the KAT error, I have another scenario. According to Limitations of MAX78000 Networks, the maximum dimension (number of rows or columns) for input or output data is 1023. In theory, a 1D-shaped input data of size, 1 x 1019
with padding size 2
is acceptable and the synthesis tool also generates the C code. But the KAT fails with the same Data mismatch
error. Input data dimensions, 1 x 1018
and 1 x 1017
also return the same error while input data of size, 1 x 1016
passes the KAT, which I am unable to comprehend.
Additionally, an input data of size, 1 x 1021
throws an error from the synthesis tool as expected due to exceeding the dimension limit. However, the synthesis tool can generate the C code when input data of size, 1 x 1020
is provided. Although it gives the same KAT error, it conflicts with the constraint mentioned, considering the effective dimension is 1024 (input dimension of 1020 + padding of 2 on either side). I assume the line decrements the input dimension and thus allowing a maximum dimension of 1024. Please correct me if I am wrong in my understanding.
ep_demo_input_size.zip contains the necessary files to replicate the above scenario.
This issue has been marked stale because it has been open for over 30 days with no activity. It will be closed automatically in 10 days unless a comment is added or the "Stale" label is removed.
Thanks for reporting this issue. In this case, the network fails not due to the convolution at the initial layer but the maxpool operation at the second layer. For the maxpool, the input length + kernel length should be smaller than 1026 but until the input length of 1016, the input length of the second layer is greater than 1017, which should be the max input length for a pooling layer with 8-length kernel. We will update the documentation accordingly and/or put proper assertions to the synthesize code.
Hello, for my application I have been able to generate the C code using the synthesizer tool. But while testing the code on the hardware, the KAT fails with the
Data mismatch
error.The sample input provided is 1D-shaped data of size,
1 x 768
and the model is as followsI am unable to interpret the error but I believe the KAT failing is due to the processor mapping in the YAML file. After multiple trial and error with the processor configurations, the code could pass the KAT. Though I have a fair understanding of creating the YAML file from the provided documentation, I am unable to understand certain configurations when compared with
energy_profiling_kat_pass.yaml
.processors
(as the previous layers) fails the KAT.processors
(as layer 3) also fails the KAT.output_processors
for the last layer 5 fails the KAT.output_processors
fails the KAT.output_processors: 0x0006.0000.0000.0000
oroutput_processors: 0x0009.0000.0000.0000
oroutput_processors: 0x0011.0000.0000.0000
for layer 5 also fails the KAT.ep_demo.zip contains the necessary files to replicate the above specific scenarios.
The C code was generated using
python ai8xize.py --verbose --test-dir demos --prefix ep_demo --checkpoint-file ep_demo_qat_best-q.pth.tar --device MAX78000 --softmax --compact-data --sample-input sample_ep_demo.npy --config-file energy_profiling_kat_pass.yaml --energy
I was also unable to use
--stop-after
to debug the problematic layer.EDIT: Using
out_channels=2
orout_channels=4
after the first layer while training and then generating the code passes the KAT for the first four above scenarios. The fifth scenario still fails the KAT.