fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.7k stars 1.55k forks source link

Tail not recognizing structured JSON #8965

Open gleesonpj opened 2 months ago

gleesonpj commented 2 months ago

Bug Report

Describe the bug JSON input via Tail appears to be processed as unstructured instead of JSON, keys, or values. The output begins with "Log" and contains each JSON line as an unstructured value. Using the Expect filter confirms that FluentBit does not see the JSON keys in the stream.

To Reproduce Conf file:

[SERVICE]
    Flush        5
    Daemon       Off
    Log_Level    trace
    parsers_file parsers.conf
    plugins_file plugins.conf

[INPUT]
    Name         tail
    Path         D:\_test\GitHub-CMS-in-network-ffs-sample.json
    Tag          CPT
    parser  json
    read_from_head   true   
    exit_on_eof  true
    Buffer_Chunk_Size  6400K
    Buffer_Max_Size 64000000K
    Static_Batch_Size 4G
    Skip_Long_Lines On

[OUTPUT]
    Name  file
    Path  D:\_test\
    Format csv   
    csv_column_names  On

[FILTER]
    Name   record_modifier
    Match *
    Allowlist_key in_network

#[FILTER]
#    Name nest
#    Match *
#    Operation nest
#    Wildcard billing_*
#    Nest_under CPT
#    Remove_prefix billing_

Parsers.conf extract

[PARSER]
    Name   json
    Format json
    Time_Key time
    Time_Format %d/%b/%Y:%H:%M:%S %z
    # Command      |  Decoder | Field | Optional Action
    # =============|==================|=================
    Decode_Field_As   json       log
    Decode_Field_As   escaped_utf8    log    do_next

Input Extract:

{
"reporting_entity_name": "medicare",
"reporting_entity_type": "medicare",
"reporting_plans": [{
"plan_name": "medicaid",
"plan_id_type": "hios",
"plan_id": "11111111111",
"plan_market_type": "individual"
},{
"plan_name": "medicare",
"plan_id_type": "hios",
"plan_id": "0000000000",
"plan_market_type": "individual"
}],
"last_updated_on": "2020-08-27",
"version": "1.0.0",
"in_network": [{
"negotiation_arrangement": "ffs",
"name": "Knee Replacement",
....

GitHub-CMS-in-network-ffs-sample.json

Example log message:

c:\Program Files\fluent-bit\bin>fluent-bit -c ..\conf\GitHub_InNetwork_CPT.conf
Fluent Bit v3.0.3
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

___________.__                        __    __________.__  __          ________
\_   _____/|  |  __ __   ____   _____/  |_  \______   \__|/  |_  ___  _\_____  \
 |    __)  |  | |  |  \_/ __ \ /    \   __\  |    |  _/  \   __\ \  \/ / _(__  <
 |     \   |  |_|  |  /\  ___/|   |  \  |    |    |   \  ||  |    \   / /       \
 \___  /   |____/____/  \___  >___|  /__|    |______  /__||__|     \_/ /______  /
     \/                     \/     \/               \/                        \/

[2024/06/16 00:36:36] [ info] Configuration:
[2024/06/16 00:36:36] [ info]  flush time     | 5.000000 seconds
[2024/06/16 00:36:36] [ info]  grace          | 5 seconds
[2024/06/16 00:36:36] [ info]  daemon         | 0
[2024/06/16 00:36:36] [ info] ___________
[2024/06/16 00:36:36] [ info]  inputs:
[2024/06/16 00:36:36] [ info]      tail
[2024/06/16 00:36:36] [ info] ___________
[2024/06/16 00:36:36] [ info]  filters:
[2024/06/16 00:36:36] [ info]      record_modifier.0
[2024/06/16 00:36:36] [ info] ___________
[2024/06/16 00:36:36] [ info]  outputs:
[2024/06/16 00:36:36] [ info]      file.0
[2024/06/16 00:36:36] [ info] ___________
[2024/06/16 00:36:36] [ info]  collectors:
[2024/06/16 00:36:36] [ info] [fluent bit] version=3.0.3, commit=3529bbb132, pid=22468
[2024/06/16 00:36:36] [debug] [engine] coroutine stack size: 98302 bytes (96.0K)
[2024/06/16 00:36:36] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/06/16 00:36:36] [ info] [cmetrics] version=0.9.0
[2024/06/16 00:36:36] [ info] [ctraces ] version=0.5.1
[2024/06/16 00:36:36] [ info] [input:tail:tail.0] initializing
[2024/06/16 00:36:36] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2024/06/16 00:36:36] [debug] [tail:tail.0] created event channels: read=828 write=832
[2024/06/16 00:36:36] [ info] [output:file:file.0] worker #0 started
[2024/06/16 00:36:36] [ info] [input:tail:tail.0] multiline core started
[2024/06/16 00:36:36] [debug] [input:tail:tail.0] flb_tail_fs_stat_init() initializing stat tail input
[2024/06/16 00:36:36] [debug] [input:tail:tail.0] inode=281474976711759 with offset=0 appended as D:\_test\GitHub-CMS-in-network-ffs-sample.json
[2024/06/16 00:36:36] [debug] [input:tail:tail.0] 1 new files found on path 'D:\_test\GitHub-CMS-in-network-ffs-sample.json'
[2024/06/16 00:36:36] [debug] [file:file.0] created event channels: read=864 write=868
[2024/06/16 00:36:36] [ info] [sp] stream processor started
[2024/06/16 00:36:36] [debug] [input:tail:tail.0] [static files] processed 3.4K
[2024/06/16 00:36:36] [ info] [input:tail:tail.0] inode=281474976711759 file=D:\_test\GitHub-CMS-in-network-ffs-sample.json ended, stop
[2024/06/16 00:36:36] [debug] [input:tail:tail.0] inode=281474976711759 file=D:\_test\GitHub-CMS-in-network-ffs-sample.json promote to TAIL_EVENT
[2024/06/16 00:36:36] [debug] [input:tail:tail.0] [static files] processed 0b, done
[2024/06/16 00:36:36] [debug] [task] created task=000001A1ABF56EC0 id=0 OK
[2024/06/16 00:36:36] [debug] [output:file:file.0] task_id=0 assigned to thread #0
[2024/06/16 00:36:36] [ warn] [engine] service will shutdown in max 5 seconds
[2024/06/16 00:36:36] [ info] [input] pausing tail.0
[2024/06/16 00:36:36] [debug] [out flush] cb_destroy coro_id=0
[2024/06/16 00:36:36] [debug] [task] destroy task=000001A1ABF56EC0 (task_id=0)
[2024/06/16 00:36:37] [ info] [engine] service has stopped (0 pending tasks)
[2024/06/16 00:36:37] [ info] [input] pausing tail.0
[2024/06/16 00:36:37] [ info] [output:file:file.0] thread worker #0 stopping...
[2024/06/16 00:36:37] [debug] [input:tail:tail.0] inode=281474976711759 removing file name D:\_test\GitHub-CMS-in-network-ffs-sample.json
[2024/06/16 00:36:37] [ info] [output:file:file.0] thread worker #0 stopped

Extract of Output

timestamp,"log"
1718510520.518025900,"{"
1718510520.518039900,""reporting_entity_name": "medicare","
1718510520.518041800,""reporting_entity_type": "medicare","
1718510520.518042800,""reporting_plans": [{"
1718510520.518044200,""plan_name": "medicaid","
1718510520.518045500,""plan_id_type": "hios","
1718510520.518046800,""plan_id": "11111111111","
1718510520.518048100,""plan_market_type": "individual""
1718510520.518048900,"},{"
1718510520.518050200,""plan_name": "medicare","
1718510520.518051400,""plan_id_type": "hios","
1718510520.518052600,""plan_id": "0000000000","
1718510520.518053900,""plan_market_type": "individual""
1718510520.518054800,"}],"
1718510520.518056000,""last_updated_on": "2020-08-27","
1718510520.518057300,""version": "1.0.0","
1718510520.518058100,""in_network": [{"
1718510520.518059300,""negotiation_arrangement
[GitHub-CMS-in-network-ffs-sample.json](https://github.com/user-attachments/files/15859426/GitHub-CMS-in-network-ffs-sample.json)
[CPT1.txt](https://github.com/user-attachments/files/15859428/CPT1.txt)
[GitHub-CMS-in-network-ffs-sample.json](https://github.com/user-attachments/files/15859429/GitHub-CMS-in-network-ffs-sample.json)
": "ffs","
1718510520.518060700,""name": "Knee Replacement","
1718510520.518061900,""billing_code_type": "CPT","
.........

CPT1.txt

  Expected behavior We expect that Fluent-Bit will filter out all keys before "in_network" and simply display the values/nested values of this key. If an "Expect" filter is used AFTER record_modify, it would not detect "reporting_entity_name" or other keys before "in_network" but it WOULD detect "in_network"

Instead: -All keys/values are listed, starting with "reporting_entity_name" -Output is in Tail's "Log" format for unstructured messages -Adding Expect to conf does not detect any keys, including "in_network"

Rough Expected Output (Not fully formatted, but a lot shorter)

1718510520.518058100,""in_network": [{"
1718510520.518059300,""negotiation_arrangement
[GitHub-CMS-in-network-ffs-sample.json](https://github.com/user-attachments/files/15859426/GitHub-CMS-in-network-ffs-sample.json)
[CPT1.txt](https://github.com/user-attachments/files/15859428/CPT1.txt)
[GitHub-CMS-in-network-ffs-sample.json](https://github.com/user-attachments/files/15859429/GitHub-CMS-in-network-ffs-sample.json)
": "ffs","
1718510520.518060700,""name": "Knee Replacement","
1718510520.518061900,""billing_code_type": "CPT","

Your Environment

Additional context Note: We tested the install using the "mem.local" example from the Manual and properly nested winstat keys: "mem.local: [1718506661.988717100, {"CPUstats":{"user":2968750,"idle":116250000,"kernel":1093750,"utilization":3.376623392105103}}]" This suggests an issue with how Tail is parsing the input in our setup.

gleesonpj commented 1 month ago

I just installed the latest fluent-bit; issue remains.

arapaho commented 1 month ago

Inputs of fluent-bit will read the provided material line by line.

By using the file GitHub-CMS-in-network-ffs-sample.json you've linked as an input, each line will be read as it is. Since it contains "formatted" (or prettyfied) json, fluent-bit will not consider it as a json content: "reporting_entity_name": "medicare", is not a valid json structure.

You have two choices to deal with that kind of content.

The first one is to flatten your content with one structure per line. For instance, the structure in the file GitHub-CMS-in-network-ffs-sample.json once flatten looks like this:

input-file:

{ "reporting_entity_name": "medicare", "reporting_entity_type": "medicare", "reporting_plans": [{ "plan_name": "medicaid", "plan_id_type": "hios", "plan_id": "11111111111", "plan_market_type": "individual" },{ "plan_name": "medicare", "plan_id_type": "hios", "plan_id": "0000000000", "plan_market_type": "individual" }], "last_updated_on": "2020-08-27", "version": "1.0.0", "in_network": [{ "negotiation_arrangement": "ffs", "name": "Knee Replacement", "billing_code_type": "CPT", "billing_code_type_version": "2020", "billing_code": "27447", "description": "Arthroplasty, knee condyle and plateau, medial and lateral compartments", "negotiated_rates": [{ "provider_groups": [{ "npi":[1111111111, 2222222222, 3333333333, 4444444444, 5555555555], "tin":{ "type": "ein", "value": "11-1111111" } },{ "npi": [1111111111, 2222222222, 3333333333, 4444444444, 5555555555], "tin":{ "type": "ein", "value": "22-2222222" } }], "negotiated_prices": [{ "negotiated_type": "negotiated", "negotiated_rate": 123.45, "expiration_date": "2022-01-01", "service_code": ["18", "19", "11"], "billing_class": "professional" },{ "negotiated_type": "negotiated", "negotiated_rate": 1230.45, "expiration_date": "2022-01-01", "billing_class": "institutional" }] },{ "provider_groups": [{ "npi": [6666666666, 7777777777, 8888888888, 9999999999], "tin":{ "type": "ein", "value": "22-2222222" } }], "negotiated_prices": [{ "negotiated_type": "negotiated", "negotiated_rate": 120.45, "expiration_date": "2022-01-01", "service_code": ["05", "06", "07"], "billing_class": "professional" }] }] },{ "negotiation_arrangement": "ffs", "name": "Femur and Knee Joint Repair", "billing_code_type": "CPT", "billing_code_type_version": "2020", "billing_code": "27448", "description": "Under Repair, Revision, and/or Reconstruction Procedures on the Femur (Thigh Region) and Knee Joint", "negotiated_rates": [{ "provider_groups": [{ "npi": [1111111111, 2222222222, 3333333333, 4444444444, 5555555555], "tin":{ "type": "ein", "value": "11-1111111" } },{ "npi": [1111111111, 2222222222, 3333333333, 4444444444, 5555555555], "tin":{ "type": "ein", "value": "22-2222222" } }], "negotiated_prices": [{ "negotiated_type": "negotiated", "negotiated_rate": 12003.45, "expiration_date": "2022-01-01", "service_code": ["18", "19", "11"], "billing_class": "professional" }] },{ "provider_groups": [{ "npi": [6666666666], "tin":{ "type": "npi", "value": "6666666666" } }], "negotiated_prices": [{ "negotiated_type": "negotiated", "negotiated_rate": 12.45, "expiration_date": "2022-01-01", "service_code": ["18", "19", "11"], "billing_class": "institutional" }] }] }] }

fluent-bit.conf:

[SERVICE]
  flush 1
  log_level info
  parsers_file ./parsers.conf

[INPUT]
  name tail
  tag tail
  read_from_head true
  refresh_interval 1
  skip_long_lines off
  path GitHub-CMS-in-network-ffs-sample.FLATTENED.json
  parser json

[OUTPUT]
  name stdout
  match *
  format json_lines
  json_date_key @timestamp
  json_date_format iso8601

Result:

Fluent Bit v3.1.3
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  __
|  ___| |                | |   | ___ (_) |         |____ |/  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`| |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \ | |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /_| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)___/

[2024/07/18 22:47:13] [ info] [fluent bit] version=3.1.3, commit=, pid=208899
[2024/07/18 22:47:13] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/07/18 22:47:13] [ info] [cmetrics] version=0.9.1
[2024/07/18 22:47:13] [ info] [ctraces ] version=0.5.2
[2024/07/18 22:47:13] [ info] [input:tail:tail.0] initializing
[2024/07/18 22:47:13] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2024/07/18 22:47:13] [ info] [sp] stream processor started
[2024/07/18 22:47:13] [ info] [input:tail:tail.0] inotify_fs_add(): inode=13265153 watch_fd=1 name=GitHub-CMS-in-network-ffs-sample.json
[2024/07/18 22:47:13] [ info] [output:stdout:stdout.0] worker #0 started
[2024/07/18 22:47:30] [ info] [input:tail:tail.0] inode=13265153 handle rotation(): GitHub-CMS-in-network-ffs-sample.json => /home/pmalamy/tmp/fluent/GitHub-CMS-in-network-ffs-sample.json~
[2024/07/18 22:47:30] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=13265153 watch_fd=1
[2024/07/18 22:47:30] [ info] [input:tail:tail.0] inotify_fs_add(): inode=13265153 watch_fd=2 name=/home/pmalamy/tmp/fluent/GitHub-CMS-in-network-ffs-sample.json~
[2024/07/18 22:47:30] [ info] [input:tail:tail.0] inotify_fs_add(): inode=13265148 watch_fd=3 name=GitHub-CMS-in-network-ffs-sample.json
[2024/07/18 22:47:30] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=13265153 watch_fd=2
{"@timestamp":"2024-07-18T20:47:30.723476Z","reporting_entity_name":"medicare","reporting_entity_type":"medicare","reporting_plans":[{"plan_name":"medicaid","plan_id_type":"hios","plan_id":"11111111111","plan_market_type":"individual"},{"plan_name":"medicare","plan_id_type":"hios","plan_id":"0000000000","plan_market_type":"individual"}],"last_updated_on":"2020-08-27","version":"1.0.0","in_network":[{"negotiation_arrangement":"ffs","name":"Knee Replacement","billing_code_type":"CPT","billing_code_type_version":"2020","billing_code":"27447","description":"Arthroplasty, knee condyle and plateau, medial and lateral compartments","negotiated_rates":[{"provider_groups":[{"npi":[1111111111,2222222222,3333333333,4444444444,5555555555],"tin":{"type":"ein","value":"11-1111111"}},{"npi":[1111111111,2222222222,3333333333,4444444444,5555555555],"tin":{"type":"ein","value":"22-2222222"}}],"negotiated_prices":[{"negotiated_type":"negotiated","negotiated_rate":123.45,"expiration_date":"2022-01-01","service_code":["18","19","11"],"billing_class":"professional"},{"negotiated_type":"negotiated","negotiated_rate":1230.45,"expiration_date":"2022-01-01","billing_class":"institutional"}]},{"provider_groups":[{"npi":[6666666666,7777777777,8888888888,9999999999],"tin":{"type":"ein","value":"22-2222222"}}],"negotiated_prices":[{"negotiated_type":"negotiated","negotiated_rate":120.45,"expiration_date":"2022-01-01","service_code":["05","06","07"],"billing_class":"professional"}]}]},{"negotiation_arrangement":"ffs","name":"Femur and Knee Joint Repair","billing_code_type":"CPT","billing_code_type_version":"2020","billing_code":"27448","description":"Under Repair, Revision, and/or Reconstruction Procedures on the Femur (Thigh Region) and Knee Joint","negotiated_rates":[{"provider_groups":[{"npi":[1111111111,2222222222,3333333333,4444444444,5555555555],"tin":{"type":"ein","value":"11-1111111"}},{"npi":[1111111111,2222222222,3333333333,4444444444,5555555555],"tin":{"type":"ein","value":"22-2222222"}}],"negotiated_prices":[{"negotiated_type":"negotiated","negotiated_rate":12003.45,"expiration_date":"2022-01-01","service_code":["18","19","11"],"billing_class":"professional"}]},{"provider_groups":[{"npi":[6666666666],"tin":{"type":"npi","value":"6666666666"}}],"negotiated_prices":[{"negotiated_type":"negotiated","negotiated_rate":12.45,"expiration_date":"2022-01-01","service_code":["18","19","11"],"billing_class":"institutional"}]}]}]}

The second solution would be to use a multi-line parser enabling to parse pretty-fied json file.

parsers.conf:

[MULTILINE_PARSER]
    name formatted-json
    type regex
    rule "start_state" "/^{/" "message1"
    rule "message1,message2" "/^\s+/" "message2"
    rule "message2" "/^}/" "message1"

fluent-bit.conf:

[SERVICE]
  flush 1
  log_level info
  parsers_file ./parsers.conf

[INPUT]
  name tail
  tag tail
  read_from_head true
  refresh_interval 1
  skip_long_lines off
  path GitHub-CMS-in-network-ffs-sample.json
  key log

[FILTER]
  name multiline
  match *
  buffer on
  multiline.key_content log
  multiline.parser formatted-json

[FILTER]
  name parser
  match *
  parser json
  key_name log

[OUTPUT]
  name stdout
  match *
  format json_lines
  json_date_key @timestamp
  json_date_format iso8601

Result:

Fluent Bit v3.1.3
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  __
|  ___| |                | |   | ___ (_) |         |____ |/  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`| |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \ | |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /_| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)___/

[2024/07/18 22:52:36] [ info] [fluent bit] version=3.1.3, commit=, pid=209278
[2024/07/18 22:52:36] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/07/18 22:52:36] [ info] [cmetrics] version=0.9.1
[2024/07/18 22:52:36] [ info] [ctraces ] version=0.5.2
[2024/07/18 22:52:36] [ info] [input:tail:tail.0] initializing
[2024/07/18 22:52:36] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2024/07/18 22:52:36] [ info] [filter:multiline:multiline.0] created emitter: emitter_for_multiline.0
[2024/07/18 22:52:36] [ info] [input:emitter:emitter_for_multiline.0] initializing
[2024/07/18 22:52:36] [ info] [input:emitter:emitter_for_multiline.0] storage_strategy='memory' (memory only)
[2024/07/18 22:52:36] [ info] [sp] stream processor started
[2024/07/18 22:52:36] [ info] [input:tail:tail.0] inotify_fs_add(): inode=13265153 watch_fd=1 name=GitHub-CMS-in-network-ffs-sample.json
[2024/07/18 22:52:36] [ info] [output:stdout:stdout.0] worker #0 started

[2024/07/18 22:52:38] [ info] [input:tail:tail.0] inode=13265153 handle rotation(): GitHub-CMS-in-network-ffs-sample.json => /home/pmalamy/tmp/fluent/GitHub-CMS-in-network-ffs-sample.json~
[2024/07/18 22:52:38] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=13265153 watch_fd=1
[2024/07/18 22:52:38] [ info] [input:tail:tail.0] inotify_fs_add(): inode=13265153 watch_fd=2 name=/home/pmalamy/tmp/fluent/GitHub-CMS-in-network-ffs-sample.json~
[2024/07/18 22:52:38] [ info] [input:tail:tail.0] inotify_fs_add(): inode=13258289 watch_fd=3 name=GitHub-CMS-in-network-ffs-sample.json
[2024/07/18 22:52:38] [ info] [filter:multiline:multiline.0] created new multiline stream for tail.0_tail
[2024/07/18 22:52:38] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=13265153 watch_fd=2
{"@timestamp":"2024-07-18T20:52:38.677660Z","reporting_entity_name":"medicare","reporting_entity_type":"medicare","reporting_plans":[{"plan_name":"medicaid","plan_id_type":"hios","plan_id":"11111111111","plan_market_type":"individual"},{"plan_name":"medicare","plan_id_type":"hios","plan_id":"0000000000","plan_market_type":"individual"}],"last_updated_on":"2020-08-27","version":"1.0.0","in_network":[{"negotiation_arrangement":"ffs","name":"Knee Replacement","billing_code_type":"CPT","billing_code_type_version":"2020","billing_code":"27447","description":"Arthroplasty, knee condyle and plateau, medial and lateral compartments","negotiated_rates":[{"provider_groups":[{"npi":[1111111111,2222222222,3333333333,4444444444,5555555555],"tin":{"type":"ein","value":"11-1111111"}},{"npi":[1111111111,2222222222,3333333333,4444444444,5555555555],"tin":{"type":"ein","value":"22-2222222"}}],"negotiated_prices":[{"negotiated_type":"negotiated","negotiated_rate":123.45,"expiration_date":"2022-01-01","service_code":["18","19","11"],"billing_class":"professional"},{"negotiated_type":"negotiated","negotiated_rate":1230.45,"expiration_date":"2022-01-01","billing_class":"institutional"}]},{"provider_groups":[{"npi":[6666666666,7777777777,8888888888,9999999999],"tin":{"type":"ein","value":"22-2222222"}}],"negotiated_prices":[{"negotiated_type":"negotiated","negotiated_rate":120.45,"expiration_date":"2022-01-01","service_code":["05","06","07"],"billing_class":"professional"}]}]},{"negotiation_arrangement":"ffs","name":"Femur and Knee Joint Repair","billing_code_type":"CPT","billing_code_type_version":"2020","billing_code":"27448","description":"Under Repair, Revision, and/or Reconstruction Procedures on the Femur (Thigh Region) and Knee Joint","negotiated_rates":[{"provider_groups":[{"npi":[1111111111,2222222222,3333333333,4444444444,5555555555],"tin":{"type":"ein","value":"11-1111111"}},{"npi":[1111111111,2222222222,3333333333,4444444444,5555555555],"tin":{"type":"ein","value":"22-2222222"}}],"negotiated_prices":[{"negotiated_type":"negotiated","negotiated_rate":12003.45,"expiration_date":"2022-01-01","service_code":["18","19","11"],"billing_class":"professional"}]},{"provider_groups":[{"npi":[6666666666],"tin":{"type":"npi","value":"6666666666"}}],"negotiated_prices":[{"negotiated_type":"negotiated","negotiated_rate":12.45,"expiration_date":"2022-01-01","service_code":["18","19","11"],"billing_class":"institutional"}]}]}]}

In my opinion, flattening the json structures should be the way to go. Multi-line parsing could impact the performances, and the regular expression rules might be tedious to build in order to cover all the cases.