Closed AnthonyBarbier closed 5 years ago
The header seems to be completely wrong (except the rank, it would be good to print the extents item by item). Where do you get the item_count
from, it seems not to come from the header that is read, or do you calculate it from there?
By the way, did you check that the path is correct and the istream is opened correctly?
What throws the error, the validate_tensor_header call?
Hi Viktor,
The stream seems to be ok.
item_count
is the one used in validate_tensor_header
, I've added the cout
just before the exception gets thrown:
const uint32_t item_count = std::accumulate(header.extents, header.extents + header.rank, (uint32_t)1, std::multiplies<uint32_t>());
if ( header.data_length != (item_count * header.bits_per_item + 7) / 8 )
{
std::cout<<"Data length "<<header.data_length<<" Item count "<<item_count<<" bits_per_item "<<header.bits_per_item<< " Calculated len "<<(item_count * header.bits_per_item + 7 ) / 8<<std::endl;
std::cout<<"extents : "<<header.extents<<" Rank "<<header.rank<<std::endl;
throw Error("data length is not compatible with extents and bits per item");
}
So it looks like the extents is correct, however the rest is not (data_length
and bits_per_item
)
Here is what I get in gdb:
$1 = (const nnef::TensorHeader &) @0x7fffffffc6d0: {magic = <incomplete sequence \357>, version = "\001", data_length = 32, rank = 4, extents = {64, 3, 7, 7, 8192, 1035248823, 1025065579, 1028203718}, bits_per_item = 1042716301, quant_code = 1042363606, quant_params = "\374\003\221<ܜ\177\273\017\017a=\226\344\255<\037\377\337=+\201[>#\363\031<\031\234F\276m<\017\276\373\273\242<ܥh=\n\360\034>\027\265[=?Τ\276\243\214ɾ\"\366L\276Hc\227=\271ޘ=[\215\200\274"}
The file on disk is 37664 which doesn't seem to match item_count * 4 + header = 9408 x 4 + 128 = 37760
My guess is only 32 bytes are being written instead of the full 128.
Will investigate further
Ok, so looking a bit deeper I think there are a few differences between the way the headers are handled in Python and C++:
data_length
field the spec says "A 4-byte unsigned integer indicating the length of the actual tensor data in bytes (starting at offset 128)" but the python exporter seems to be using this field to indicate the size of the header and then shrink the header size. f.write(self._b_magic_number) # 2 bytes: OK
f.write(self._b_version_major) # 1 byte: OK
f.write(self._b_version_minor) # 1 byte: OK
f.write(self._b_data_offset) # 4 bytes: OK (But content is header length rather than data length)
f.write(self._b_tensor_rank) # 4 bytes: OK
f.write(self._b_tensor_dimensions) # 16 bytes: Wrong (Only 'rank' dimensions, not MaxRank=32bytes)
f.write(self._b_data_type) # 1 byte --> Doesn't exist in spec & C++
f.write(self._b_bit_width) # 1 byte: wrong --> 4 bytes in spec & C++
f.write(self._b_len_quantize_algo_string) # 2 bytes: Wrong --> 4 bytes in spec & C++
f.write(self._b_quantize_algo_string) # 0 byte: Wrong Should be 76
The following patch seems to make it work for me (Although it's quite dirty which is why I haven't created a pull request)
diff --git a/au-zone/nnef_converter/common/nnef_data.py b/au-zone/nnef_converter/common/nnef_data.py
index 0022918..bd61849 100644
--- a/au-zone/nnef_converter/common/nnef_data.py
+++ b/au-zone/nnef_converter/common/nnef_data.py
@@ -199,6 +199,8 @@ class Header:
self._b_tensor_dimensions = b''
for dim in value:
self._b_tensor_dimensions += struct.pack('<I', dim)
+ while len(self._b_tensor_dimensions) < 8*4:
+ self._b_tensor_dimensions += struct.pack('<I',0)
else:
self._tensor_dimensions.append(struct.unpack('<I', value)[0]) #int.from_bytes(value, byteorder='little', signed=False))
self._b_tensor_dimensions += value
@@ -250,6 +252,7 @@ class Header:
self._len_quantize_algo_string = ''
self._quantize_algo_string= ''
self._modified = False
+ self._data_length = 0
self.header_field_sizes = collections.OrderedDict()
self.header_field_sizes['magic_number']=2
@@ -274,6 +277,9 @@ class Header:
assert isinstance(nnef_dtype, NNEFDType), "Header requires nnef_dtype to be of type NNEFDType"
assert isinstance(quantize_algo_string, str), "Header requires quantize_algo_string to be of type str"
+ self._data_length = nnef_dtype.get_nnef_bit_width() / 8
+ for d in tensor_shape:
+ self._data_length = self._data_length * d
self.set_tensor_rank(len(tensor_shape))
self.set_tensor_dimensions(tensor_shape)
self.set_data_type(nnef_dtype.get_nnef_base_type_value()) #data_type.value
@@ -299,14 +305,14 @@ class Header:
f.write(self._b_magic_number)
f.write(self._b_version_major)
f.write(self._b_version_minor)
- f.write(self._b_data_offset)
+ f.write(struct.pack('I', self._data_length))
f.write(self._b_tensor_rank)
f.write(self._b_tensor_dimensions)
- f.write(self._b_data_type)
- f.write(self._b_bit_width)
- f.write(self._b_len_quantize_algo_string)
+ #f.write(self._b_data_type)
+ f.write(struct.pack('I',int(struct.unpack('B',self._b_bit_width)[0])))
+ f.write(struct.pack('I',int(struct.unpack('H',self._b_len_quantize_algo_string)[0])))
f.write(self._b_quantize_algo_string)
- return len(f.getvalue()), f.getvalue()
+ return 128, f.getvalue()
def read_content(self, f):
try:
The commit 65b8ec4b1eccfbce30e6ab70492c674aafbe007c is for NNEF tools or for GoogleNet? I can't find it in this repo.
Anyways, it seems to me that the Python code exports in the provisional binary format, which has been completely changed in the final version, that's why it does not match. I wonder why we never caught this during testing, probably because the test included the same kind of import code, which covered the problem.
I'll let Au-Zone know this and discuss fixing it. Actually, the exporter should be using the IO code available in the nnef parser package, and should not be duplicated.
That's the hash from the Caffe2 Zoo The NNEF-Tools hash is 6825938ba466ead23be5e43a470c918fbd7d78ab
I'll let Au-Zone know this and discuss fixing it. Actually, the exporter should be using the IO code available in the nnef parser package, and should not be duplicated.
Agreed, I assume that's what you've done for the Caffe parser and I haven't seen any issue with it.
Maybe you could try to fix it based on the Caffe code, invoking the same binary export method (should be nnef.write_tensor
with a file name and a numpy array argument). I am not working during the holidays, and I am not sure about the Au-Zone guys either, so if you need something quick, you might be better off not waiting for them.
We have pushed some updates to the Caffe converters, and reading the binaries should be fixed. Can you try if it works now?
Created from https://github.com/caffe2/models/tree/master/googlenet Commit: 65b8ec4b1eccfbce30e6ab70492c674aafbe007c Command used to convert the model: python -m nnef_converter.convert --input-framework caffe2 --output-framework NNEF --input-model predict_net.pb --data-model init_net.pb --value-info value_info.json --output-model googlenet/graph.nnef
I added the following debug code in
nnef/common/binary.h
:Exception:
The tensor is 64x3x7x7 so the number of items is correct, however
bits_per_item
anddata_length
are wrong.nnef_convert.log