Manufacturing Data - Company Identifiers

askpatrickw commented 4 years ago

I really want to be able to parse out the mfg_data the format of which is

Size: 2 or more octets
The first 2 octets contain the Company Identifier Code 
followed by additional manufacturer specific data

I can handle grabbing out that company id, but I wanted to run by you all how to do the lookup as there are currently 2181 valid company ids. Is a dictionary fine for this {'company_id': 'company_name', ... } in its own constants_companies.py or is there some other preferred way to handle a large lookup such as this.

I believe all of these Assigned Numbers are going to have similar lookup problems, but my guess is that this is the largest one.

Let me know what you all think.

ukBaz commented 4 years ago

For my 2 cents worth...and thinking out-loud... There are three things I would consider: easy of inputting (and maintaining) the data, speed of lookup, memory size.

The Bluetooth SIG used to make available an XML file of the various assigned numbers, but that doesn't seem to be there any more. If there was the XML still that would be the easiest format to have the data in. However it would be slower to look up than some of the other options. Having the data as a python dictionary would be faster to look up but would required some conversion from the SIG XML data. The downside of using a python dictionary is it would take up more memory. Having an SQLite3 database would be fairly fast to look up values and be memory efficient, but would be a binary format so adds some overhead on the maintenance. There is also option to have the data in a pickle format. This is probably better than the SQLite as it will only be read by this library.

So after all that, I think your suggestion of having a python dictionary in a separate file is probably the best compromise. I would possibly go for a file name that is more descriptive. Maybe something like: bluetooth_sig_company_identifiers.py

askpatrickw commented 4 years ago

Thanks for thinking out loud, I had some similar thoughts and ultimately arrived at the same conclusion. I could see in the future deciding this is too big and making it an optional module, but waiting to see if there is an explicit request for that (looking at you MicroPython). I don't know how to make it optional.. but i know its possible.

As for the maintenance of the list, I assume new companies are added sequentially, so it shouldn't be to hard for even a very inexperienced developer to add to the file as needed. I'll put in a fairly detailed set of comments in the file.

Thanks again, I'll run with your suggestions.

askpatrickw commented 4 years ago

Hitting a bit of a roadblock... the payloads seem to be malformed. I think they are c data type string literals represented as python bytes(). Here are some prints of 4 bad payloads and one that is correct. Notice the first element in the bytes() are "u", "L", "W", "Sn" and not hex values as in the last example.

Payload + Type: b'u\x00B\x04\x01\x80`T\xbdy\xb7\xe6\xb3V\xbdy\xb7\xe6\xb2\x013\x00\x00\x00\x00\x00' + <class 'bytes'>
Payload + Type: b'L\x00\x10\x05\x01\x181\x14\xbb' + <class 'bytes'>
Payload + Type: b'W\x00\xd1\x1e\x00\x00\xb7\x95' + <class 'bytes'>
Payload + Type: b'SN\x93' + <class 'bytes'>
Payload + Type: b'\xa7\x05\x03\x138vWAZj3wfr\x00' + <class 'bytes'>

I'm looking into this and posted on SO as well. I'm surprised they are are all over the place..

askpatrickw commented 4 years ago

Capturing some notes before bed...

This is one example I've seen where the payload looks good.
And you can see that the payload has the word Sonos in it.
The Company ID for Sonos is not 0xA705 its 0x05A7... which is the first two bytes reversed (huh?)

Payload + Type: b'\xa7\x05\x03\x11Sonos_DaLjp' + <class 'bytes'>
hex_string: a7 05 03 11 53 6f 6e 6f 73 5f 44 61 4c 6a 70 
hexstring_to_bytearray: bytearray(b'\xa7\x05\x03\x11Sonos_DaLjp')
{'flags': 6, 'type': 'ADV_IND', 'address_type': 'RANDOM', 'address': BDAddress('E7:E7:CB:66:A4:AD'), '_name': None, 'name_is_complete': False, 'tx_pwr_lvl': 0, 'appearance': 0, 'uuid16s': [UUID16(0xfe07)], 'uuid32s': [], 'uuid128s': [], 'service_data': None, 'svc_data_uuid16': None, 'public_tgt_addr': None, 'adv_itvl': None, 'svc_data_uuid32': None, 'svc_data_uuid128': None, 'uri': None, 'mfg_data': b'\xa7\x05\x03\x11Sonos_DaLjp', 'rssi': 0, 'raw_data': None}

This is one of the "bad" entries with the "L" in the beginning.
Reversing the two bytes in the bytearray (as we observed above) gives you 0x004c which is Apple. If I lookup that PUBLIC MAC Address It is also associated with Apple.
https://macaddresschanger.com/bluetooth-mac-lookup/D0%3A03%3A4B%3A38%3A0A%3A45

NOTE: That lookup won't work for RANDOM addresses.

Payload + Type: b'L\x00\x0f\x08\xc0\nP\x89\x1d\x00D\x0b\x10\x02\x01\x04' + <class 'bytes'>
hex_string: 4c 00 0f 08 c0 0a 50 89 1d 00 44 0b 10 02 01 04 
hexstring_to_bytearray: bytearray(b'L\x00\x0f\x08\xc0\nP\x89\x1d\x00D\x0b\x10\x02\x01\x04')
{'flags': 26, 'type': 'ADV_IND', 'address_type': 'PUBLIC', 'address': BDAddress('D0:03:4B:38:0A:45'), '_name': None, 'name_is_complete': False, 'tx_pwr_lvl': 12, 'appearance': None, 'uuid16s': [], 'uuid32s': [], 'uuid128s': [], 'service_data': None, 'svc_data_uuid16': None, 'public_tgt_addr': None, 'adv_itvl': None, 'svc_data_uuid32': None, 'svc_data_uuid128': None, 'uri': None, 'mfg_data': b'L\x00\x0f\x08\xc0\nP\x89\x1d\x00D\x0b\x10\x02\x01\x04', 'rssi': 4, 'raw_data': None}

Maybe the data is fine and I'm wrong about how this works. Reversing the first two bytes is very confusing to me...

ukBaz commented 4 years ago

What you are talking about sounds similar to issues I've had working with Bluetooth. This may not be your problem (or you may be aware of these issues already).

The Company ID for Sonos is not 0xA705 its 0x05A7... which is the first two bytes reversed (huh?)

This is because all (or at least most) Bluetooth data is sent with little endian so it will look like the numbers are reversed.

This is one of the "bad" entries with the "L" in the beginning. Reversing the two bytes in the bytearray (as we observed above) gives you 0x004c which is Apple.

I think the L being displayed is just an artefact of you printing the value. 0x004c is the UTF-8 value for capital L. I suspect if you did a from_bytes then you would get the correct integer values

Hope that helps.

TheCellule / python-bleson

Manufacturing Data - Company Identifiers #65