Description of the issue:

When trying to get Twin devices, I get the exception java.lang.IllegalArgumentException: parameter contains non UTF-8 character

Code sample exhibiting the issue:

QueryCollection twinQuery = deviceTwinClient.queryTwinCollection("SELECT * FROM devices");

        while (deviceTwinClient.hasNext(twinQuery)) {
            try {
                QueryCollectionResponse<DeviceTwinDevice> response = deviceTwinClient.next(twinQuery);
                response.getCollection().forEach(twin -> devicesId.add(twin.getDeviceId()));
            } catch (IllegalArgumentException e) {
                LOGGER.warn("Error while getting twin device", e);
            }
        }

I checked to method and I thing the comparison is wrong: if(str.getBytes("UTF-8").length != str.length()) whereas it should be if(str.getBytes("UTF-8").length != str..getBytes().length)

Maybe I am wrong so if so, please could you help me solve this?

I already checked my devices data and there is no non-UTF-8 characters.

Thank you.

timtay-microsoft commented 6 years ago

It looks like you are correct about this. If you'd like to provide a PR for this, I'd be happy to merge it in for you.

JMayrbaeurl commented 6 years ago

I don't think that

if (str.getBytes("UTF-8").length != str.getBytes().length)

is the correct way of checking for the existence of valid characters in the device ID, since str.getBytes() will always use the default encoding that's probably different on the different machines. According to the documentation a device ID is a case-sensitive string (up to 128 characters long) of ASCII 7-bit alphanumeric characters plus certain special characters: - . + % _ # * ? ! ( ) , = @ $ '. This means only 1 byte per character and that's what the original check in the source code is doing. Right? Btw. On my machine the provided source code snippet works correctly. What device IDs have you registered on your IoT Hub?

timtay-microsoft commented 6 years ago

I'm pretty curious about the device ids involved here, too. @gabiSRC Can you provide a list of your device ids?

gabiSRC commented 6 years ago

@JMayrbaeurl @timtay-microsoft here are my IDs: So only authorized characters

I thing that because QueryCollectionResponse also contains the reportes/desired properties content it may have some characters that take more than 1 byte to be encoded in UTF-8 because when I am on debug mode, I see that the 2 strings are identicals

JMayrbaeurl commented 6 years ago

Yes. You are right. In the meantime I'm able to reproduce the bug. Simply add this to the desired properties section in the device twin "testwithDiacr": "§".

According to the documentation this is valid, since there's no restriction on the value of properties (see here : 'All values in JSON objects can be of the following JSON types: boolean, number, string, object. Arrays are not allowed.')

This means that's definetly a bug, since the complete read json string of the device twin is checked against single byte characters. But the real bug is in com.microsoft.azure.sdk.iot.deps.serializer.ParserUtility.validateStringUTF8(String), that uses the wrong UTF-8 character check: if(str.getBytes("UTF-8").length != str.length()). And to be honest. There's simply no way to check a String! You'll have to do this on the byte array. And this is in com.microsoft.azure.sdk.iot.service.devicetwin.QueryCollection.sendQueryRequest(QueryOptions)

        //Codes_SRS_QUERYCOLLECTION_34_017: [This function shall send an HTTPS request using DeviceOperations.]
        HttpResponse httpResponse = DeviceOperations.request(this.iotHubConnectionString, this.url, this.httpMethod, payload, null, this.timeout);

        //Codes_SRS_QUERYCOLLECTION_34_018: [The method shall read the continuation token (x-ms-continuation) and response type (x-ms-item-type) from the HTTP Headers and save it.]
        handleQueryResponse(httpResponse);

        //Codes_SRS_QUERYCOLLECTION_34_021: [The method shall create a QueryResponse object with the contents from the response body and its continuation token and return it.]
        this.isInitialQuery = false;
        return new QueryCollectionResponse(new String(httpResponse.getBody()), this.responseContinuationToken);

In the first line in DeviceOperations.request a HTTP request with UTF-8 charset is set up. But the conversion from a byte array to a String happens in the last line. Unfortunately completly wrong, since it's using the default encoding in new String(byte[]). Must be fixed as well!

Anyway. Checking the valid format of the device twin has to be done com.microsoft.azure.sdk.iot.deps.serializer.QueryResponseParser.QueryResponseParser(String) on Json attribute level, checking keys and values differently.

JMayrbaeurl commented 6 years ago

PR with fix is now available

gabiSRC commented 6 years ago

Great! Thank you :)

timtay-microsoft commented 6 years ago

The fix for this issue has been checked in, and should be released within the week. Thanks for the detailed writeup, @gabiSRC and thanks for the PR @JMayrbaeurl!

gabiSRC commented 6 years ago

Great, so, waiting for the new version's release then :)

I am closing the issue as all is OK.

Azure / azure-iot-sdk-java

validateStringUTF8 issue in #296

Description of the issue:

Code sample exhibiting the issue: