Open ian-liao-databricks opened 4 days ago
The behavior currently is as intended when schemas are conflicting. Customers who want to customize this can always define their own schema - or disable schema inference which will allow to get the exact JSON payload in the _rawBody column and do any custom schema inference on top of it.
Disable schema inference and providing a custom-defined schema doesn't help. I tried to define the schema like StructField('Data', StringType(), True)and still got nulls for arrays. It feels more like a bug because the user specifically wants a string.
Disable schema inference and NOT providing a custom schema works as a workaround. _rawBody returns the whole item as a string and the user can further parse it in Spark.
On Wed, Jun 26, 2024 at 2:00 PM Fabian Meiswinkel @.***> wrote:
The behavior currently is as intended when schemas are conflicting. Customers who want to customize this can always define their own schema - or disable schema inference which will allow to get the exact JSON payload in the _rawBody column and do any custom schema inference on top of it.
— Reply to this email directly, view it on GitHub https://github.com/Azure/azure-sdk-for-java/issues/40837#issuecomment-2192618608, or unsubscribe https://github.com/notifications/unsubscribe-auth/BBGRFVHOH5OR2E65R4M5KM3ZJMTWRAVCNFSM6AAAAABJ6PO7VGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJSGYYTQNRQHA . You are receiving this because you authored the thread.Message ID: @.***>
Thanks - the custom schema with StringType should work even for an array. I will reactive this GitHub issue to track investigating/fixing that part.
Sounds good, thanks!
On Wed, Jun 26, 2024 at 2:34 PM Fabian Meiswinkel @.***> wrote:
Thanks - the custom schema with StringType should work even for an array. I will reactive this GitHub issue to track investigating/fixing that part.
— Reply to this email directly, view it on GitHub https://github.com/Azure/azure-sdk-for-java/issues/40837#issuecomment-2192661996, or unsubscribe https://github.com/notifications/unsubscribe-auth/BBGRFVH3FDN3Z5YCEP5XE3TZJMXUVAVCNFSM6AAAAABJ6PO7VGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJSGY3DCOJZGY . You are receiving this because you authored the thread.Message ID: @.***>
Describe the bug In a cosmos container, a field can have different structures for items. One of the best approach is to read the field as a string, as schema inference would do. However, the connector is not able to read a json array as a string.
Exception or Stack Trace NA
To Reproduce Create a Cosmos container with two items:
and
Query this container from spark using schema inference. Null is returned for the first item's Data column.
Code Snippet
Expected behavior Data column should return the JSON array below as a string
Screenshots
Setup (please complete the following information):
Additional context NA
Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report