MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.21k stars 21.37k forks source link

Synapse openrowset for json does not return separate rows for each object #100187

Closed PhilKoay closed 4 months ago

PhilKoay commented 1 year ago

When I execute the code provided on this page in a synapse query image

It returns a single line, Not "seperate line" per object in the array. Why is that!?!? Thanks for your help!


Document Details

Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

YashikaTyagii commented 1 year ago

@PhilKoay Thanks for your feedback! We will investigate and update as appropriate.

Naveenommi-MSFT commented 1 year ago

Hi @PhilKoay you need to write additional query, please go through this below link for your reference.

Query JSON data in SQL Server and Synapse Analytics - diangermishuizen.com

Disclaimer: This response contains a reference to a third-party World Wide Web site. Microsoft is providing this information as convenient to you. Microsoft does not control these sites and has not tested any software or information found on these sites; therefore, Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there.

PhilKoay commented 1 year ago

Could you tell me what the query should be? I tried your documented query, and looked through this post already, and I don't why the original query in your own documentation does not work in my synapse query.

How should I add to the query?

Phil Koay

On Thu, 20 Oct 2022, 9:06 pm Naveenommi-MSFT, @.***> wrote:

Hi @PhilKoay https://github.com/PhilKoay you need to write additional query, please go through this below link for your reference.

Query JSON data in SQL Server and Synapse Analytics - diangermishuizen.com https://diangermishuizen.com/query-json-data-in-sql-server-and-synapse-analytics/

Disclaimer: This response contains a reference to a third-party World Wide Web site. Microsoft is providing this information as convenient to you. Microsoft does not control these sites and has not tested any software or information found on these sites; therefore, Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there.

— Reply to this email directly, view it on GitHub https://github.com/MicrosoftDocs/azure-docs/issues/100187#issuecomment-1285338948, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCWIF62V7UAN25OQUOCPQLWEERSJANCNFSM6AAAAAARJXCZWA . You are receiving this because you were mentioned.Message ID: @.***>

PhilKoay commented 1 year ago

Also note only one line is produced in my original post. Not one line per object. Which is said will be returned in your documentation.

Phil Koay

On Thu, 20 Oct 2022, 9:06 pm Naveenommi-MSFT, @.***> wrote:

Hi @PhilKoay https://github.com/PhilKoay you need to write additional query, please go through this below link for your reference.

Query JSON data in SQL Server and Synapse Analytics - diangermishuizen.com https://diangermishuizen.com/query-json-data-in-sql-server-and-synapse-analytics/

Disclaimer: This response contains a reference to a third-party World Wide Web site. Microsoft is providing this information as convenient to you. Microsoft does not control these sites and has not tested any software or information found on these sites; therefore, Microsoft cannot make any representations regarding the quality, safety, or suitability of any software or information found there.

— Reply to this email directly, view it on GitHub https://github.com/MicrosoftDocs/azure-docs/issues/100187#issuecomment-1285338948, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCWIF62V7UAN25OQUOCPQLWEERSJANCNFSM6AAAAAARJXCZWA . You are receiving this because you were mentioned.Message ID: @.***>

Naveenommi-MSFT commented 1 year ago

Hi @PhilKoay

Please Note, GitHub forum is dedicated for docs related issues. For any technical queries or clarifications, we encourage to utilize Microsoft Q & A platform. Kindly raise your query on Microsoft Q&A Platform

PhilKoay commented 1 year ago

This is a documentation problem surely if it does not work?

Can you explain if it doesn't work why would you document it that way?

Phil Koay

On Fri, 21 Oct 2022, 9:06 pm Naveenommi-MSFT, @.***> wrote:

Hi @PhilKoay https://github.com/PhilKoay

Please Note, GitHub forum is dedicated for docs related issues. For any technical queries or clarifications, we encourage to utilize Microsoft Q & A platform. Kindly raise your query on Microsoft Q&A Platform https://docs.microsoft.com/en-us/answers/index.html

— Reply to this email directly, view it on GitHub https://github.com/MicrosoftDocs/azure-docs/issues/100187#issuecomment-1286806688, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCWIF2MM75XP773KXPZ75TWEJ2M3ANCNFSM6AAAAAARJXCZWA . You are receiving this because you were mentioned.Message ID: @.***>

Naveenommi-MSFT commented 1 year ago

@azaricstefan Could you please review this and update as appropriate.

Naveenommi-MSFT commented 1 year ago

@PhilKoay Thanks for your feedback! I've assigned this issue to the author who will investigate and update as appropriate.

azaricstefan commented 1 year ago

@PhilKoay by one JSON document in the documentation we meant one JSON file.

With this query you can read JSON text file (jsonl) separated by new line (\n character)

select top 10 *
from openrowset(
        bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.jsonl',
        format = 'csv',
        fieldterminator ='0x0b',
        fieldquote = '0x0b'
    ) with (doc nvarchar(max)) as rows

You mentioned a query that reads one JSON document/file and it is read as 1 row in results.

If you want to extract the fields from JSON you can use queries like this:

select
    JSON_VALUE(doc, '$.date_rep') AS date_reported,
    JSON_VALUE(doc, '$.countries_and_territories') AS country,
    CAST(JSON_VALUE(doc, '$.deaths') AS INT) as fatal,
    JSON_VALUE(doc, '$.cases') as cases,
    doc
from openrowset(
        bulk 'latest/ecdc_cases.jsonl',
        data_source = 'covid',
        format = 'csv',
        fieldterminator ='0x0b',
        fieldquote = '0x0b'
    ) with (doc nvarchar(max)) as rows
order by JSON_VALUE(doc, '$.geo_id') desc

Referenced queries are in this article

cc: @jovanpop-msft as author of this article.

PhilKoay commented 1 year ago

Hi Stefan,

Can you try running this code from the technical documentation. You will need to run it from mssms for it to work ( not directly from synapse)

select top 10 *

from openrowset(

    bulk '

https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.jsonl ',

    format = 'csv',

    fieldterminator ='0x0b',

    fieldquote = '0x0b'

) with (doc nvarchar(max)) as rows

I have tried running this on three seperate tenants. They all return one row. Not one row per object. If it returns one row I believe this is not working as documented

Cheers

Phil

On Thu, 27 Oct 2022, 12:11 am Stefan Azarić, @.***> wrote:

@PhilKoay https://github.com/PhilKoay by one JSON document in the documentation we meant one JSON file.

With this query you can read JSON text file (jsonl) separated by new line (\n character)

select top 10 * from openrowset( bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.jsonl', format = 'csv', fieldterminator ='0x0b', fieldquote = '0x0b' ) with (doc nvarchar(max)) as rows

You mentioned a query that reads one JSON document/file and it is read as 1 row in results.

If you want to extract the fields from JSON you can use queries like this:

select JSON_VALUE(doc, '$.date_rep') AS date_reported, JSON_VALUE(doc, '$.countries_and_territories') AS country, CAST(JSON_VALUE(doc, '$.deaths') AS INT) as fatal, JSON_VALUE(doc, '$.cases') as cases, doc from openrowset( bulk 'latest/ecdc_cases.jsonl', data_source = 'covid', format = 'csv', fieldterminator ='0x0b', fieldquote = '0x0b' ) with (doc nvarchar(max)) as rows order by JSON_VALUE(doc, '$.geo_id') desc

Referenced queries are in this article https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-json-files#query-json-files-using-json_value

cc: @jovanpop-msft https://github.com/jovanpop-msft as author of this article.

— Reply to this email directly, view it on GitHub https://github.com/MicrosoftDocs/azure-docs/issues/100187#issuecomment-1292115899, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCWIF4CW32MGQDIS7ULF5DWFE32BANCNFSM6AAAAAARJXCZWA . You are receiving this because you were mentioned.Message ID: @.***>

Phil Koay

azaricstefan commented 1 year ago

Hi Stefan, Can you try running this code from the technical documentation. You will need to run it from mssms for it to work ( not directly from synapse) select top 10 * from openrowset( bulk ' https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.jsonl ', format = 'csv', fieldterminator ='0x0b', fieldquote = '0x0b' ) with (doc nvarchar(max)) as rows I have tried running this on three seperate tenants. They all return one row. Not one row per object. If it returns one row I believe this is not working as documented Cheers Phil On Thu, 27 Oct 2022, 12:11 am Stefan Azarić, @.> wrote: @PhilKoay https://github.com/PhilKoay by one JSON document in the documentation we meant one JSON file. With this query you can read JSON text file (jsonl) separated by new line (\n character) select top 10 from openrowset( bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.jsonl', format = 'csv', fieldterminator ='0x0b', fieldquote = '0x0b' ) with (doc nvarchar(max)) as rows You mentioned a query that reads one JSON document/file and it is read as 1 row in results. If you want to extract the fields from JSON you can use queries like this: select JSON_VALUE(doc, '$.date_rep') AS date_reported, JSON_VALUE(doc, '$.countries_and_territories') AS country, CAST(JSON_VALUE(doc, '$.deaths') AS INT) as fatal, JSON_VALUE(doc, '$.cases') as cases, doc from openrowset( bulk 'latest/ecdc_cases.jsonl', data_source = 'covid', format = 'csv', fieldterminator ='0x0b', fieldquote = '0x0b' ) with (doc nvarchar(max)) as rows order by JSON_VALUE(doc, '$.geo_id') desc Referenced queries are in this article https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-json-files#query-json-files-using-json_value cc: @jovanpop-msft https://github.com/jovanpop-msft as author of this article. — Reply to this email directly, view it on GitHub <#100187 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCWIF4CW32MGQDIS7ULF5DWFE32BANCNFSM6AAAAAARJXCZWA . You are receiving this because you were mentioned.Message ID: **@.***> Phil Koay

I tried, it returns the same in Synapse Studio and SSMS.

image image

That is what this query returns. You can use JSON functions to extract values if you would like: JSON_VALUE. More info on this link that I shared above.

You have 2 queries in an example that you shared.

First query does return multiple rows:

select top 10 *
from openrowset(
        bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.jsonl',
        format = 'csv',
        fieldterminator ='0x0b',
        fieldquote = '0x0b'
    ) with (doc nvarchar(max)) as rows
go
bandersmsft commented 4 months ago

Thanks for your dedication to our documentation. Unfortunately, at this time we have been unable to review your issue in a timely manner and we sincerely apologize for the delayed response. We are closing this issue for now, but if you feel that it's still a concern, please respond and let us know. If you determine another possible update to our documentation, please don't hesitate to reach out again. #please-close