Open mshonichev opened 9 months ago
@mshonichev thank you again for such a detailed report, I really appreciate it.
For my own reference, here's where the 2400 character limit is described:
I'm glad the second upstream in ansible.cfg
has worked so far.
I'll think on how to address this issue, perhaps with short and long-term options.
Probably the first thing I'm going to change is separating out parts of the collection info into separate properties; at least, the pieces that glactory itself needs. It's not a full solution necessarily in that a single property could still end up > 2400 chars, but that would be an even more extreme edge case. This is a breaking change, but it has an advantage of making native Artifactory searches more useful by adding more individual properties that could be used in searches.
A safe but messy approach is splitting collection info into multiple properties by length, essentially paginating it within the artifactory properties. This would be backward compatible with existing uploaded collections, though it wouldn't fix existing broken collections, which should be ok since they can be deleted and re-uploaded or re-proxied. Feels ugly though...
I did not realise that Galactory uses Artifactory as a search engine aside to storage backend. That's actually pretty cool. But yeah, if all we need is a lookup by name / version, storing whole blob in props is a bit awkward.
IMO, splitting props to pages is cringe. It adds a whole lot of cases when meta changes - e.g. authors added/removed.
Here, suppose we need to find a collection by its author. Let's break down backend storage in two layers, first is meta storage layer, second is actual collection tar.gz storage layer.
A first layer might be smth like
_props/
by_author/
j.doe.json
b.nigh.json
....
each individual <author.json>
has props set to author name and email, so we can lookup by jf rt s
query.
And contents of json is a list of collection references, e.g.
[
{
'theforeman.foreman': {'3.15.0': '/actual/path/to/collection/3.15.tar.gz'}
}
]
So, to get the actual archive out of search results you'll need a second query.
It would too hit hard on corner cases to support such a structure, but at least it could be introduced without b/w compatibility break.
Remember, that 'broken' collection with bloated meta was never actually uploaded, so 'broken' props need no fix.
We just start up a whole new life aside, adding a few new search capabilities, and all previously uploaded files need no rehaul.
Hi there again!
We've found another fancy issue in our setup of Galactory as a proxy / inner source for Ansible collections.
Fast forward to root cause: storing collection metadata in the Artifactory file properties was a nice idea, however for some collections, specifically
theforeman
.foreman
, metadata is such huge in size that it hits the Artifactory property value limit and thus is failed to properly deserialize.Reproducer:
ansible.cfg
is configured to use galactory as default galaxy server.Starting galaxy collection install process Process install dependency map Starting collection install process Downloading http://galaxy.local/download/cisco-nxos-5.2.1.tar.gz to /Users/m.shonichev/.ansible/tmp/ansible-local-54036f2c1i_ng/tmpi7sor7ba/cisco-nxos-5.2.1-vjgrss63 Installing 'cisco.nxos:5.2.1' to '/home/m.shonichev/.collections/ansible_collections/cisco/nxos' cisco.nxos:5.2.1 was installed successfully
container logs reveal stack trace:
The upstream collection was successfully downloaded from https://galaxy.ansible.com and stored in the Artifactory repository.
However, following are the properties of file (actually a tail of):
Apparently, The Foreman team decides to write down all the seven tribes of relatives (cats included!) as a collection authors list and while Galactory tries to put
collection_info
into file properties, an 2400 chars limit was hit and JSON was corrupted.Collection meta: https://github.com/theforeman/foreman-ansible-modules/blob/a69d83fb8681ca80878d8143d27594eb72ee295d/galaxy.yml#L4
From the Artifactory docs:
I'm not sure that limit can be raised up easily, at least documentation is obscure about. Also I'm not sure if removing metadata from properties would not break the Galactory codebase to pieces.
So, currently we had to add upstream galaxy as a backup server as a workaround: