Closed awanlin closed 2 years ago
@OrkoHunter here is the 2nd issue and I've included a sample archive from ADOS 2020 so that you can see what gets generated
I was struggling with this today. Glad to find out it wasn't just me. It is a weird bug, after expanding the archive, not all of the files are there.
@backstage/maintainers for visibility - general readTree problem? As in, not techdocs specific
My initial suspicion is at this piece of code, which hints that downloaded files will be flattened out intentionally if scopePath
is defined.
hey @OrkoHunter, are you working on it or should I support you somehow?
I'm experimenting with what you suggested and unfortunately by removing the scopePath
, nothing changes - still the same error
Hey @nadworny ! Sorry yeah, all my debugging attempts led me no where with this. Removing my assignment since I'm not working on this.
it's strange as when using exactly the same HTTP request, I'm getting the correct zip structure. It must be in this case unzipping that somehow messes it up?
@freben @OrkoHunter - it is indeed the stripping as suggested by @awanlin. Why exactly are we stripping the first directory while unzipping the files? If I remove the stripping, everything works as expected. https://github.com/backstage/backstage/blob/f8a93ec1ae8dc211729968b294dec29245217653/packages/backend-common/src/reading/tree/ZipArchiveResponse.ts#L98 https://github.com/backstage/backstage/blob/f8a93ec1ae8dc211729968b294dec29245217653/packages/backend-common/src/reading/tree/ZipArchiveResponse.ts#L147
@nadworny Good find! Yeah the thing is, we had been trying this out with various providers and they all had an inner zip file structure where there was a single root dir that had ... I forget, but like the git hash or branch name or something, and the actual contents inside. Maybe it's the case here, that we need to make this behavior optional, and somehow know when to apply it or not. Just outright removing it, will likely break other sources
Although honestly ... at this point it seems the AzureUrlReader is the only one actually using it. So if we could somehow conclude that this dir-stripping is NEVER used by ANY version of Azure, we could just remove it, or make it configurable with a boolean and then leaving it as default false.
My main problem then, is that I do not have the setup to get that tested toward several remotes to see that we don't introduce breakage. We'd love to get some help with that from the community.
I get it. Yeah, i.e. for GitHub techdocs it isn't invoked.
At the moment only version 6.0 of Azure git API is being supported by the AzureUrlReader so I don't think we should worry about other versions of Azure as, if I understand it correctly, this is the only way to download something from Azure... I guess it's your choice if we remove it or make it configurable. The latter will probably take me a bit more time to implement ;)
@freben what do you prefer me to do? :)
@nadworny I'd vote for this being configurable as the zip file output is different between Azure DevOps Services (Cloud version) and Azure DevOps Server 2020 (On-premise version). We use Azure DevOps Server 2020 and the supported versions for the REST API are always lagging behind some what randomly depending on the feature.
If any testing is needed I'm more than happy to help with that :)
Oof, but that means that we'd have to use some form of feature detection logic essentially, right?
Or could we in some other way just look at a single zip file's contents and deduce that "ok this is the non-nested format"? Just having a single root folder isn't in and of itself perfectly secure right - could it be that the format for the folder is for example orgname-reponame-hash
or something? I've seen that somewhere, if that's the case then that might be good enough.
Tricky thing being, we can only consume the incoming data stream once so we might have to do some trickery there
@freben yeah, that sounds right, give me some time to look through this code again to get it fresh in my mind and I might be able to come up with a better idea. The easy way to determine if it's the Cloud version of Azure DevOps is the URL would have dev.azure.com
in it where as the on-premise version of Azure DevOps could be anything
@awanlin are you saying that Azure DevOps Server 2020 and Azure DevOps Cloud provide different ZIP formats despite having the same API version (6.0)?
@nadworny yes, that was my experience when I logged this issue. I'll try to find some time to test this again to confirm, I have a personal account using Azure DevOps Clouds for side projects so I can test this there
@awanlin I'm testing with the Azure DevOps Clouds (dev.azure.com) and the ZIP format is exactly as the one you included in this issue.
@nadworny that's awesome then! Either I was wrong when I initially did my testing, which is very possible I remember being under a time crunch, or something changed with the Cloud version
@awanlin I assume the latter 😄
Yeah I'm sure we added this for good reason because it was needed at that point in time. I'm also gonna assume that the cloud version actually changed the format some time recently. So unless there is reason to believe that there can be variation across different customers or something for some strange reason ... maybe it's safe enough to consider changing our behaviour correspondingly and listen for community feedback whether it breaks for somebody.
If y'all have the ability to bump up that confidence by trying out downloads on different variations, it'd be great.
I for sure can test it using Azure DevOps Server 2020. As for the Cloud version I don't have a personal Backstage instance running so that may take longer. @nadworny perhaps you can handle that side?
I did multiple tests on Azure Cloud last week (6.0, 6.1-preview) and I can confirm that the ZIP format was always as the one included in this issue.
hi @awanlin, any update from your side?
@freben assuming that @awanlin's test is positive, we can proceed with removing of the stripping functionality?
Sorry for the slow follow up @nadworny I got pulled into a rabbit hole due to the fact we set a different docs_dir
and that is considered a security issue for the Basic TechDocs setup.
Anyway, testing clearly worked or I would not have hit this issue. I also tried the API URL directly just now and the results were exactly as we expected them matching the attached sample in this issue
So what is the decision guys @awanlin @freben? 😄 Do we remove stripping functionality?
That's my vote. @benjdlambert ?
Or rather - keep the functionality but set it to false?
Sorry just coming back round to this - yeah, let's set it to false but keep the functionality so that we have the option still.
Just came back from holidays :)
@awanlin shall I try to implement it or have you started something already?
This is a pretty high value contribution. If you have the time to work on it @nadworny it'd be great!
@nadworny I would love to but I'm in a bit of limbo with my work and contributing to open source projects. This is why I've tried to go above and beyond with the details in my issues and to help where I can on Discord (I go by Ahhhndre there) in my spare time.
@awanlin alright I'll try to pick it up next week
Hi @freben , I've just created a PR for it - my first ever for OS project. Do you need to approve it first so that it goes through the ci/cd pipeline?
Expected Behavior
TechDocs should still generate using the URL Reader method with Azure DevOps Server 2020
Current Behavior
Once Issue #4357 has been fixed you'll now get the following error(s) when opening a Docs site using the URL Reader method:
In the UI:
Error: Failed to generate docs from C:\Users\UserName\AppData\Local\Temp\backstage-OCDxtp into C:\Users\UserName\AppData\Local\Temp\techdocs-tmp-6bZilb with error Docker container returned a non-zero exit code (1) at TechdocsGenerator.run (webpack-internal:///../techdocs-common/src/stages/generate/techdocs.ts:120:13) at runMicrotasks () at processTicksAndRejections (internal/process/task_queues.js:93:5) at async DocsBuilder.build (webpack-internal:///../../plugins/techdocs-backend/src/DocsBuilder/builder.ts:94:5) at async eval (webpack-internal:///../../plugins/techdocs-backend/src/service/router.ts:151:13)
Then in the backend logs there is also the following messages:
techdocs debug Failed to generate docs from C:\Users\UserName\AppData\Local\Temp\backstage-OCDxtp into C:\Users\UserName\AppData\Local\Temp\techdocs-tmp-6bZilb type=plugin techdocs debug Build failed with error: {"status":"Pulling from spotify/techdocs","id":"latest"},{"status":"Digest: sha256:915bbe59e2602882674d8308ff1211c0c7ab8bdb1a48121df22c9c1abc6a55b1"} {"status":"Status: Image is up to date for spotify/techdocs:latest"},ERROR - Config value: 'docs_dir'. Error: The path /content/docs isn't an existing directory.
Possible Solution
Upon walking through the code it looks like the Zip Archive downloaded for the Azure DevOps Server 2020 REST API isn't in the expected format. Removing the
stripTopDirectory
function and just using theentry.path
as the arg for 'getInnerPath` from this line fixes the issue:https://github.com/backstage/backstage/blob/d941f6187b6490878cfea67a46ce573130659765/packages/backend-common/src/reading/tree/ZipArchiveResponse.ts#L156
Here is a sample archive of a repo downloaded using the Azure DevOps Server 2020 REST API: SampleRepo.zip
Steps to Reproduce
These steps assume that you are using Azure DevOps Server 2020 and that Issue #4357 has been fixed.
catalog-info.yaml
file and add it to your repobackstage.io/techdocs-ref
annotation using the URL Reader format for it's value:url:https://ados-servername/organization/project/_git/repository
\docs
folder with at least anindex.md
mkdocs.yml
file in the root of you repo with this as the body of the file:Notice that after some time you'll get the following error in the UI:
Error: Failed to generate docs from C:\Users\UserName\AppData\Local\Temp\backstage-OCDxtp into C:\Users\UserName\AppData\Local\Temp\techdocs-tmp-6bZilb with error Docker container returned a non-zero exit code (1) at TechdocsGenerator.run (webpack-internal:///../techdocs-common/src/stages/generate/techdocs.ts:120:13) at runMicrotasks () at processTicksAndRejections (internal/process/task_queues.js:93:5) at async DocsBuilder.build (webpack-internal:///../../plugins/techdocs-backend/src/DocsBuilder/builder.ts:94:5) at async eval (webpack-internal:///../../plugins/techdocs-backend/src/service/router.ts:151:13)
Then in the backend logs there is also the following messages:
techdocs debug Failed to generate docs from C:\Users\UserName\AppData\Local\Temp\backstage-OCDxtp into C:\Users\UserName\AppData\Local\Temp\techdocs-tmp-6bZilb type=plugin techdocs debug Build failed with error: {"status":"Pulling from spotify/techdocs","id":"latest"},{"status":"Digest: sha256:915bbe59e2602882674d8308ff1211c0c7ab8bdb1a48121df22c9c1abc6a55b1"} {"status":"Status: Image is up to date for spotify/techdocs:latest"},ERROR - Config value: 'docs_dir'. Error: The path /content/docs isn't an existing directory.
Context
We are currently running Backstage using a fork, we'd really like to move away from that as it is a large amount of overhead to manage the merges. Getting this working would get us to that point.
Your Environment
The key element to this is using Azure DevOps Server 2020