Open jo-ale35 opened 6 years ago
@jo-ale35 : which version of fio are you on and does your problem change if you use the git master version of fio?
am using "fio version" : "fio-3.8-5-g464b-dirty" and "fio version" : "fio-3.8"
@jo-ale35 are you able to deduce which part of the JSON is malformed?
The output we get is like the following:
The problem is that you get multiple sets of independently valid json, but smashed together it is not. You'd have to cut and paste each "section" (the one that starts with "fio version" etc) into separate files, then it would validate fine.
Not sure how we can improve that, to be honest. Status interval dumps valid json output for each time period specified.
I don't know json well enough to guess if there's some way we can separate streams nicely in the file. If that was possible, then we could just do that before dumping output. For instance, have a comma separator between them. I'll try and google a bit, but in lieu of that being possible, then I don't see how we can feasibly fix this. You'd simply need to manually separate the streams after the run is over - which kind of sucks, but isn't the end of the world.
we are creating an array and putting the different outputs inside it with each one seperated by a comma.
I'd be fine with something like that. Do you have a patch?
As of now am parsing the json generated by fio and making it to valid json.
@axboe I do not have a patch for the code , but I had written a workaround in python snippet to achieve the same to make it a valid JSON . It makes the content as an array of lists , which is a valid JSON
from pathlib import Path
import sys
p=Path('.')
#fetching the .json files using regular expression for custom loading
s=list(p.glob('[a-zA-Z0-9]*_DCPMEM_FIO_PMEM[a-zA-Z0-9]*.json'))
filedata = None
#assigning the run time arguments. arguments will be the file name
a=sys.argv[1]
#opening the file in read mode
with open(a, 'r') as file :
filedata = file.read()
#replacing the pattern }\n{ of the file with the below to make it a valid json
filedata = filedata.replace('}\n{', '},\n{')
#inserting this in the beginning and the end of the file to make it a array of lists
filedata = "[" + filedata + "]"
#writing the data into the same file
with open(a, 'w') as file:
file.write(filedata)
I'd like to look into tackling this patch.
The "problem" is that when FIO prints multiple status (through use of "--status-interval" option) the resulting output is multiple top level json "objects" in one file. The proposed solution is to make this a top level array of json objects. Is this a fair restatement of the situation?
There are some nuances to cover though.
In option 1 above the json output is consistent, it's always an array at the top-level this is a positive because an application can use the same parsing logic whether it is a single status or multiple statuses. Option 2 has the downside of any existing applications that parse the json output today will break. (Option 1 has really has a lesser version of the same problem, as any application that has a built-in work-around like what @suprajamayya posted above would break.
Does it make sense to have a new option in --output-format for "json_array" and "json+_array" that will act as a switch for this new code rather than changing the existing behavior?
Is this a fair restatement of the situation?
Sounds like it. It's a bit tough on people who are wrapping the existing format but I would think they would quickly spot the change but I see you've already thought of that...
Does it make sense to have a new option in --output-format for "json_array" and "json+_array" that will act as a switch for this new code rather than changing the existing behavior?
That sounds like a very sensible way to solve this. Half of me would want it to be called "json++" but maybe that's a step to far and will give people the wrong idea... I'm already in agreement with the main idea and me bikeshedding the name doesn't further your work :-).
FWIW looking at https://en.wikipedia.org/wiki/JSON_streaming the solutions there don't really solve the original problem which seems to be a desire to have strictly well formed json that can be read with "any" JSON tool (which is what the JSON array proposal does). To me the only issue with the JSON array solution will be processing the output "online". I'd argue if you're processing online you can write a custom tool to cope with the current json/json+ modes (which basically offer "Concatenated JSON") and the JSON array option will be useful to those processing offline (so people can pick).
--output-format=json not giving output in proper json format when used with parameter --status-interval.