Synthesis accuracy issues

isztldav commented 1 year ago

Issue: for the same input in Q8 mode [-127;128]: ai8x.set_device(87, True, False) (Pytorch prediction) =/= Synthesis (KAT)

Pytorch step

The provided u-net was trained with quant-aware training to perform segmentation.
The QAT weights were converted to the format q8 with the provided tool.

the Q8 weights were evaluated by using code similar to:


model = ai87unet(...)
class Args:
def __init__(self, act_mode_8bit):
    self.act_mode_8bit = act_mode_8bit
    self.truncate_testset = False

args = Args(act_mode_8bit=True) ai8x.set_device(87, True, False) # True to simulate device checkpoint = torch.load('qat_best_q8.pth.tar') state_dict = checkpoint['state_dict'] ai8x.fuse_bn_layers(model) model.load_state_dict(state_dict, strict=True) model = model.to(device)


- This code succesfully simulates the device with full 8bit integers and the output is in range [-128;127] and the accuracy **is** maintained.

## Synthesis step
I observed that the synthesis step does not generate the KAT by using Pytorch, but it runs the input through a custom code. This code generates the final output, which is then computed on the device (KAT) and this test is passed successfully. Therefore the device computes exactly what is expected.

## Issue encountered
The main problem arises when analyzing the KAT output. The synthesis is producing an output that is by factors worse than the output (prediction) of PyTorch in Q8 mode (for the same input).  

Was something similar ever observed?
I can provide more information if needed. Thank you for your help.

nikky4D commented 1 year ago

Were you able to resolve this issue? I'm noticing something similar in object detection models

MaximGorkem commented 1 year ago

Hi,

Thanks for reporting the issue. Could you please give us some clarifications?

Do you notice pixel class prediction differences between simulate.py and PyTorch?

How do you test the PyTorch operation? Are you running the QAT model? Could you let me know if you follow the steps below for QAT models in your PyTorch test?

# Fuse the BN parameters into conv layers before Quantization Aware Training (QAT)
ai8x.fuse_bn_layers(model)

# Switch model from unquantized to quantized for QAT
ai8x.initiate_qat(model, qat_policy)

model = apputils.load_lean_checkpoint(model, checkpoint_path, model_device=device)
ai8x.update_model(model)

isztldav commented 1 year ago

Thank you for your reply @MaximGorkem .

In short

This is the expected output according to PyTorch in 8 bit mode:

Instead, the "sampleoutput.h" looks like this:

In depth

All files required for you to replicate my setup are in: report_files.zip

Content on the zip file:

052.png : original input image, must be manually scaled down to 88x88, it was captured via the max78002 CSI camera
ai87net-unet.yaml : this is the network yaml description
max78_eval_unet.ipynb : contains the PyTorch 8bit evaluation, as swell as the UNET architecture with the max78 fused layers
qat_checkpoint.pth.tar : quant-aware pre-trained network
qat_checkpoint-q8.pth.tar : generated 8bit network (used in max78_eval_unet.ipynb)
sample_carpark_p88.npy : is the generated file used for the KAT test, its actually just the "052.png" image normalised by your tool
sampleoutput.h : is the generated KAT answer

Other notes:

Q8 was generated via: python quantize.py qat_checkpoint.pth.tar qat_checkpoint-q8.pth.tar --device MAX78002 -v
the C code is generated via python ai8xize.py --test-dir CNN_example --prefix car-parking-seg --checkpoint-file ../torch20_checkpoint/qat_checkpoint-q8.pth.tar --config-file networks/ai87net-unet.yaml --device MAX78002 --timer 0 --display-checkpoint --verbose

When the code is built and tested on the device, this passes the KAT test.
So my feeling is that somewhere the accuracy is lost and thus the segmentation doesn't match that of the PyTorch in 8bit mode. This makes pretty much the device not usable as I cannot obtain the expected accuracy.

ermanok commented 1 year ago

Hi,

Thanks for sharing the resources about the issue. I went over them and it seems the issue is due to a minor problem in the yaml file. At Layer 7 (up3_3), the input sequence is given as [6, 3] but at the model definition (in max78_eval_unet.ipynb) the input of up3_3 defined as 'torch.cat((conv2_2, up3), dim=1)'. Therefore, the line 81 of the yaml file should be changed as 'in_sequences: [3, 6]'.

Note that, this change is not enough as the 'output_processors' fields of Layer 3 and Layer 6 must be swapped and 'processors' field of the Layer 4 must be changed accordingly.

isztldav commented 1 year ago

Wow thank you so much for finding this! Indeed now it works. Thank you!

analogdevicesinc / ai8x-synthesis