Closed pshao25 closed 2 years ago
This is our CI that meets this problem: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=1877589&view=logs&j=ca395085-040a-526b-2ce8-bdc85f692774&t=3763df9f-aac5-5292-209f-31e3fd069cd1 This is the published coverage report: https://dev.azure.com/azure-sdk/internal/_build/results?buildId=1877589&view=artifacts&pathAsName=false&type=publishedArtifacts
You will see the coverage json file was broken and check-coverage fails at parseJSON
hhm, could also just wait until the previous write is done before starting a new one.
The problem with writing at the end is: when is the end? The server being the one writing the coverage it's hard to know. Could do it when asked to shutdown but that would not work if the process get forced killed(Would be something to note that you have to make sure to not do when using the test server). You also would have to make sure to stop the server before continuing
Ok ended setting a few hooks to make sure it is written on exit. If it doesn't work let me know and how you are killing the processs so we can find a solution.
Maybe can also provide a /.admin/save-coverage
endpoint to manually force write.
Thanks @timotheeguerin for the quick fix! Personally, I think providing a /.admin/save-coverage
endpoint to manually force write is more straightforward. Because as a customer, I need to first realize I have to kill the process, before I can save the coverage file. Secondly, If I want to kill the process in CI, I need to first find the process ID running cadl-ranch serve
, and then kill the process, it's a bit complex. (Currently in CI, we don't manually kill the process, just let CI finish and kill the running processes automatically)
PS: we can also provide a stop server
command like testserver to shutdown the server and save the file, like: cadl-ranch stop-server --coverageFile xxx
. This is also very straight forward.
I guess I talked to Izzy today about it and python had a different workflow where the fix I made worked well.
What do you think about having a cadl-ranch server start
and cadl-ranch server stop
command that automatically start the process in the background.
Can always also do the admin endpoint with yet another command cadl-ranch server save-coverage
Thanks Tim for the quick response. Having cadl-ranch server stop
works.
Does it mean we still provide --coverageFile
when start the server, and use the stop
command to stop the server and the file will be saved?
@timotheeguerin are you saying we could only start one instance of cadl-ranch in one machine? Otherwise, which one is cadl-ranch server stop
going to stop?
Are you running multiple instance of the test server on the same machine at once? How are you using it?
Correct. We would run test in parellel and we start cadl-ranch by such command You could see two instances of cadl ranch are started at port 64683 and 64686 separately.
Why are you not starting a single instance if you are running test on parallel with on the same machine?
And then if you are doing this wouldn’t this be the original cause of the corrupted file? 2 process trying to write together?
Why are you not starting a single instance if you are running test on parallel with on the same machine?
We could. It just costs and has low priority. So I don't want to do this currently. But that is a good idea and we could do that in the future.
And then if you are doing this wouldn’t this be the original cause of the corrupted file? 2 process trying to write together?
No, we .net haven't face this issue currently. It is Jave facing this and Java only start one process.
Actually my point is: we could start two instances of cadl ranch, so theoretically cadl-ranch server stop
doesn't know which one to stop?
I am not proposing replacing the current serve. I am suggesting to add start and stop command as an alternative. The way I would see this is by allowing the port to be passed in the stop command which I guess you must know.
Hi @timotheeguerin, may I know when could start and stop command be available, because currently our CI pipeline is pending on this stop command to stop the server and save the coverage file on exit (currently our pipeline can't save and publish coverage file)?
I can get that done today.
Hi @timotheeguerin, thanks for the quick fix. I saw your pull request, but I found the command cadl-ranch server stop
command is not recognized in cadl-ranch cli.
$ npx cadl-ranch server stop
npm WARN config global `--global`, `--local` are deprecated. Use `--location=global` instead.
cadl-ranch <command>
Commands:
cadl-ranch validate-scenarios <scenarios Compile and validate all the Cadl sc
Path> enarios.
cadl-ranch generate-scenarios-summary <s Compile and validate all the Cadl sc
cenariosPath> enarios.
cadl-ranch serve <scenariosPath> Serve the mock api at the given path
s.
cadl-ranch check-coverage <scenariosPath Serve the mock api at the given path
> s.
cadl-ranch validate-mock-apis <scenarios Validate mock apis have all the scen
Path> arios specified
cadl-ranch upload-manifest <scenariosPat Upload the scenario manifest. DO NOT
h> CALL in generator.
cadl-ranch upload-coverage Upload the coverage report.
Options:
--version Show version number [boolean]
--help Show help [boolean]
--debug Output debug log messages. [boolean] [default: false]
Unknown arguments: server, stop
I checked the release of cadl ranch: https://github.com/Azure/cadl-ranch/actions/workflows/release.yml, finding the release for ‘Start and stop server commands’ failed. Not sure if this is reason of the command failure. Please kindly take a look. Thanks.
yeah seems like the publish pipeline didn't create the PR bumping the versions, should be publihsed now.
Symptom: @haolingdong-msft found the printed coverage report is broken as See coverage file here.
Root cause: This a random issue so there is a guess of how this happen. We save coverage data for every request by await writeFile. Take a look at the comments from writeFile:
A potential issue happens like:
This is the issue we might face. For the specific broken issue @haolingdong-msft met, maybe somehow "coverage data2" is longer than "coverage data1", and the "coverage data1" just replace the data in the buffer (which already has "coverage data2"). Then the json brokes.
Anyway, we might need to reconsider the implementation of saving the coverage report. Make it sync (which costs) or save file only once at the end.