backstage / backstage

Backstage is an open framework for building developer portals
https://backstage.io/
Apache License 2.0
27.88k stars 5.87k forks source link

🐛 Bug Report: Entity in wildcard subfolder not showing in catalog when fetched from GitLab #25402

Open arvid-digg opened 3 months ago

arvid-digg commented 3 months ago

📜 Description

When I point the catalog location towards a local file with a wildcard subfolder, it fetches both Domain and System. However, if I point the catalog location towards a GitLab URL, it can't fetch System.

Folder structure

backstage-metadata
|-- import-all.yaml
`-- metadata
    |-- domain
    |   |-- domain1.yaml
    |   `-- domain2.yaml
    `-- system
        |-- domain1
        |   |-- system1.yaml
        |   `-- system2.yaml
        `-- domain2
            `-- system3.yaml

Content of import-all.yaml

apiVersion: backstage.io/v1alpha1
kind: Location
metadata:
  name: import-all-from-metadatarepo
  description: A collection of all templates and metadata from backstage-metadata
spec:
  targets:
    - ./metadata/domain/*.yaml
    - ./metadata/system/*/*.yaml # Note the wildcard subfolder

app-config.local.yaml

catalog:
  providers:
    gitlab:
      my_gitlab:
        host: <redacted>
        orgEnabled: true
        fallbackBranch: main
        skipForkedRepos: false
        entityFilename: catalog-info.yam
        schedule:
          frequency: { minutes: 3}
          timeout: { minutes: 3 }

  import:
    entityFilename: catalog-info.yaml
    pullRequestBranchName: backstage-integration
  rules:
    - allow: [Component, System, API, Resource, Location, Domain, User]
  locations:
    # This rule doens't work with subfolders
    - type: url
      target: https://<redacted>/backstage-metadata/-/blob/master/import-all.yaml
      rules:
        - allow: [Domain, Location, System]

    # This rule works with subfolders
    # - type: file
    #   target: ../../../../config/backstage-metadata/import-all.yaml
    #   rules:
    #     - allow: [Domain, Location, System]

Is it possible to fetch wildcard subfolder when using GitLab and URL? We have a workaround but then we need to specify each subfolder in metadata/system.

👍 Expected behavior

Would love if the GitLab URL could fetch wildcard subfolders.

👎 Actual Behavior with Screenshots

GitLab can't deliver wildcard URLs. It throws error: SendArchive: copy 'git archive' output: rpc error: code = FailedPrecondition desc = path doesn't exist. Uri: /api/v4/projects/<redacted>/repository/archive?sha=master\u0026path=metadata%2Fsystem%2F*

👟 Reproduction steps

Curl GitLab with wildcard expression: curl --header "PRIVATE-TOKEN: <redacted>" \ --url "https://<redacted>/api/v4/projects/<redacted>/repository/archive?sha=master&path=metadata%2Fsystem%2F*"

📃 Provide the context for the Bug.

No response

🖥️ Your Environment

No response

👀 Have you spent some time to check if this bug has been raised before?

🏢 Have you read the Code of Conduct?

Are you willing to submit PR?

No, but I'm happy to collaborate on a PR with someone else

freben commented 3 months ago

Interesting. If you perform the same curl command but without %2F* at the end, does that solve the problem and give you a reasonable archive back?

If that's the case, then maybe this code needs to be smarter about how much it trims from the URL before trying to read the archive.

arvid-digg commented 2 months ago

@freben If I remove %2F* at the end, I get an archive containing the system folder:

archive.tar.gz
|-- domain1
|   |-- system1.yaml
|   `-- system2.yaml
`-- domain2
    `-- system3.yaml

So yes, it seems that piece of code needs to be smarter.

benjdlambert commented 2 months ago

I think that this is pretty related: https://github.com/backstage/backstage/issues/22504

We'd ideally want to move some of that logic up to the URLReader implementations I think :pray:

enryson commented 2 months ago

i have the same issue on azure https://github.com/backstage/backstage/issues/25820

github-actions[bot] commented 4 days ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.