Open simon-jouet opened 5 years ago
Okay I think I figured out the issue, when fetching the packages the authorization token is sent, but in my case the authorization token is very very large (about 7k) which is I believe exceeding nginx buffers resulting in the query not being parsed properly in nginx logs:
62.30.156.32 - [62.30.156.32] - - [27/May/2019:09:27:28 +0000] "-" 000 0 "https://verdaccio.mydomain/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36" 5258 0.000 [] - - - - f6fbab185e38c4ac2ca692bdf83a3603
I've decoded the authorization token and it contains groups
and real_groups
which is huge. real_groups
contains all the groups as well as repos which in my case is about ~60 entries. The more problematic one is groups
which starts with the content of groups
then contains the scopes (below) and the repeats again the content of groups which sums about to ~120 entries
"$all",
"$authenticated",
"@all",
"@authenticated",
"all",
I'm doing this with the verdaccio-gitlab master Dockerfile, I simply changed to use the tagged 4 release which was released yesterday
Finally got it working!
By default ingress-nginx
has http/2 enabled and the default configuration cannot cope with headers this large. I've changed my ingress-nginx
config with http2-max-field-size: 8k
and it's working. The issue was that with a smaller size nginx fails and closes the connection and because it's http/2 it doesn't send or log any errors (I was expecting a 414).
I think the JWT token should be stripped to contain less information, is there any reasons for the duplicates in real_groups
? What is the purpose of both groups
and real_groups
? The issue is that storing this kind of info in the JWT token will always result in this issue, it will just depend on the number of groups/repos in gitlab.
@simon-jouet Thanks for the update!
GitLab differentiates between group name and group path, see also https://docs.gitlab.com/ce/api/groups.html
@bufferoverflow thanks for that, make sense, I will have a deeper look into the group api once I get this up and running :).
Regarding the duplicates in real_groups
do you think it's an issue with the current code or it's the expected behaviour? Maybe a simple improvement for the time being would be filter out duplicates? (unfortunately I can't post the decoded base64 token here because it contains sensitive info)
In the longer term, fixing the nginx config for me worked but I can quite easily imagine someone with significantly more repos resulting in a far too large header to be sensible. Do you think it would make sense to maybe just provide a token to the user and store the groups in memory (or possibly redis?).
We can close this issue if you want, just detailed the symptoms and resolution in case anybody else run into a similar problem.
Could you paste some small sample of the duplicate entries in the real_groups
? You don't need to paste the whole decoded token, but just to get an idea of what might be wrong.
iirc the entries in the token depend on what we return from the authenticate
call in the plugin, verdaccio will use that I guess to fill out the token contents.
Also, are you sure it's jwt what we're talking about? I thought that verdaccio 4.x required a new jwt
config entry in the verdaccio.yml
file to activate them, otherwise it defaulted to the legacy behaviour. I don't see it activated in your configuration but maybe I'm mixing things.
Thanks @dlouzan,
Here is the anonymised content of the token, just renamed the projects and repos but kept it consistent
{
"real_groups": [
"project1",
"project2",
"project3",
"project3/repo1",
"project3/repo2",
"project3/repo3",
"project3/repo4",
"project2/repo5",
"project2/repo6",
"project2/repo7",
"project2/repo8",
"project2/repo9",
"project4/repo10",
"project2/repo11",
"project2/repo12",
"project2/repo13",
"project1/repo14",
"project2/repo15",
"project2/repo16",
"project2/repo17",
"project2/repo18",
"project2/repo19",
"project2/repo20",
"project2/repo21",
"project2/repo22",
"project2/repo23",
"project2/repo24",
"project3/repo25",
"project2/repo26",
"project2/repo27",
"project2/repo28",
"project2/repo29",
"project2/repo30",
"project2/repo31",
"project2/repo32",
"project2/repo33",
"project3/repo34",
"project2/repo35",
"project2/repo36",
"project2/repo37",
"project4/repo38",
"project2/repo39",
"project2/repo40",
"project1/repo41",
"project1/repo42",
"project1/repo43",
"project1/repo44",
"project2/repo45",
"project1/repo46",
"project3/repo47",
"project1/repo48",
"project5/repo49",
"project1/repo50",
"project1/repo51",
"project1/repo52",
"project6/repo53"
],
"name": "simon-jouet",
"groups": [
"project1",
"project2",
"project3",
"project3/repo1",
"project3/repo2",
"project3/repo3",
"project3/repo4",
"project2/repo5",
"project2/repo6",
"project2/repo7",
"project2/repo8",
"project2/repo9",
"project4/repo10",
"project2/repo11",
"project2/repo12",
"project2/repo13",
"project1/repo14",
"project2/repo15",
"project2/repo16",
"project2/repo17",
"project2/repo18",
"project2/repo19",
"project2/repo20",
"project2/repo21",
"project2/repo22",
"project2/repo23",
"project2/repo24",
"project3/repo25",
"project2/repo26",
"project2/repo27",
"project2/repo28",
"project2/repo29",
"project2/repo30",
"project2/repo31",
"project2/repo32",
"project2/repo33",
"project3/repo34",
"project2/repo35",
"project2/repo36",
"project2/repo37",
"project4/repo38",
"project2/repo39",
"project2/repo40",
"project1/repo41",
"project1/repo42",
"project1/repo43",
"project1/repo44",
"project2/repo45",
"project1/repo46",
"project3/repo47",
"project1/repo48",
"project5/repo49",
"project1/repo50",
"project1/repo51",
"project1/repo52",
"project6/repo53",
"$all",
"$authenticated",
"@all",
"@authenticated",
"all",
"project1",
"project2",
"project3",
"project3/repo1",
"project3/repo2",
"project3/repo3",
"project3/repo4",
"project2/repo5",
"project2/repo6",
"project2/repo7",
"project2/repo8",
"project2/repo9",
"project4/repo10",
"project2/repo11",
"project2/repo12",
"project2/repo13",
"project1/repo14",
"project2/repo15",
"project2/repo16",
"project2/repo17",
"project2/repo18",
"project2/repo19",
"project2/repo20",
"project2/repo21",
"project2/repo22",
"project2/repo23",
"project2/repo24",
"project3/repo25",
"project2/repo26",
"project2/repo27",
"project2/repo28",
"project2/repo29",
"project2/repo30",
"project2/repo31",
"project2/repo32",
"project2/repo33",
"project3/repo34",
"project2/repo35",
"project2/repo36",
"project2/repo37",
"project4/repo38",
"project2/repo39",
"project2/repo40",
"project1/repo41",
"project1/repo42",
"project1/repo43",
"project1/repo44",
"project2/repo45",
"project1/repo46",
"project3/repo47",
"project1/repo48",
"project5/repo49",
"project1/repo50",
"project1/repo51",
"project1/repo52",
"project6/repo53"
],
"iat": 1558948513,
"nbf": 1558948513,
"exp": 1559553313
}
Also, are you sure it's jwt what we're talking about? I thought that verdaccio 4.x required a new jwt config entry in the verdaccio.yml file to activate them, otherwise it defaulted to the legacy behaviour. I don't see it activated in your configuration but maybe I'm mixing things.
I don't have JWT explicitly enabled but i'm talking about the bearer token passed when querying the packages which highly looks like a JWT token {"alg":"HS256","typ":"JWT"}
The config i'm using is still the one I posted previously.
EDIT: apologies for saying duplicate real_groups before, it's in groups
not real_groups
The groups
duplication looks suspicious and might be a bug, but I'm still puzzled about the jwt token authentication.
@juanpicado Does this ring a bell? any idea why we're seeing jwt tokens in this configuration?
Ok, maybe the documentation is a bit misleading. According to the PR that introduced JWT, it's enabled by default on API calls, but not on web requests:\ https://github.com/verdaccio/verdaccio/pull/896
The following post also documents the expected behaviour that groups are added as payload of the token:\ https://medium.com/verdaccio/diving-into-jwt-support-for-verdaccio-4-88df2cf23ddc
JWT also contains an immutable payload, meaning that, once the token is being signed, we store the list of assigned user groups within the payload. Thus, for each request the API does not verify credentials against the authentication provider, it just verifies whether the token is valid and provides access to the resource.
So apart from the duplicated entries, I'm not sure we'll be able to solve that problem with the size directly. Additionally, since we expect to contact gitlab for authentication, we might need to document a recommended verdaccio.yml
configuration in which we re-check the authentication more often than the default of 60 days (groups privileges could have changed).
We're having this issue as well, even with these http2 settings the UI errors out after logging in (we're on verdaccio 4.0.0 due to #81):
http2-max-field-size: 32k
http2-max-header-size: 64k
~Just to add to this, we're using the Docker image with tag latest
, but somehow end up with verdaccio@4.0.0-alpha.3
(it's in the bottom right of the UI)? Is that intentional?~
Submitted PR #81 to fix this.
@dlouzan sorry late to the party
@simon-jouet actually, here an snippet of the logic behind the real groups.
async jwtEncrypt(user: RemoteUser, signOptions: JWTSignOptions): string {
const { real_groups, name, groups } = user;
const realGroupsValidated = _.isNil(real_groups) ? [] : real_groups;
const groupedGroups = _.isNil(groups) ? real_groups : groups.concat(realGroupsValidated);
const payload: RemoteUser = {
real_groups: realGroupsValidated,
name,
groups: groupedGroups,
};
const token: string = await signPayload(payload, this.secret, signOptions);
// $FlowFixMe
return token;
}
I think the JWT token should be stripped to contain less information, is there any reasons for the duplicates in real_groups? What is the purpose of both groups and real_groups?
Why real group exist? I have NO CLUE 😆, many things were added to Sinopia with 0 backup, 0 context and 0 code review in the post Sinopia era and pre Verdaccio era. For reasons of backward compatibility were keep them all this time and I personally was more concerned about other topics. Perhaps they should be removed, I'll consider this for Verdaccio 5.
The groups duplication looks suspicious and might be a bug, but I'm still puzzled about the jwt token authentication.
Does this ring a bell? any idea why we're seeing jwt tokens in this configuration?
The logic above happens every time a token is singed, so @dlouzan (I removed the other comment it is a non sense, I was wrong) the same thoughts about the duplication, like I mentioned above, it was a legacy logic, might be a bug or just something intended, I'm not sure who/whom are related on this logic by far.
@juanpicado I'll try to reserve some time tomorrow to take a detailed look at this, it might well be a bug from our end
I updated my comment 👍 I was wrong.
const groupedGroups = _.isNil(groups) ? real_groups : groups.concat(realGroupsValidated);
@juanpicado Is my assumption above right?
Ok, maybe the documentation is a bit misleading. According to the PR that introduced JWT, it's enabled by default on API calls, but not on web requests: verdaccio/verdaccio#896
That would mean that by default the jwt token is the approach taken for all api calls, is this so?
Nop, JWT is enabled in Web since someone developed that part by default, there is no legacy with the Web API, the CLI API is auth legacy token by default.
Please bear in mind that while reducing the JWT size might resolve the issue for some/most users, this won't be a permanent fix. In large(r) GitLab instances a user just need to have access/be maintainer of enough groups to trigger this issue again.
Could there be any other solution despite dropping everything into the JWT? Eg. configure a separate storage (defaults to JWT) for those information.
I encountered this problem when using a cluster with Nginx Ingress Controller. But when I configured it on a K8s cluster that uses Traefik the listing worked ok at first but now I'm getting 400 (Bad Request) at every request .
Other coworkers with access to almost the same repositories are not having the same issue.
My header is over 18k with this plugin - our company has a group with a huge list of repositories (in subgroups as well). I might be abple to PR a solution by saving the details of the group in another way, like a database.
Seems related https://github.com/verdaccio/verdaccio/discussions/2068
Hi,
(this isn't a follow up to #74 but quite a different question/issue)
So i've been trying to deploy verdaccio-gitlab in a kubernetes cluster that i'm setting for our next dev environment (migrating from docker swarm). I got most things working but i'm stumbling a bit on verdaccio-gitlab and was hoping to get some insights and maybe if i'm lucky some input from people who have a kubernetes deployments working properly.
I can get verdaccio to load properly, I can get the login to work but once I'm logged in I'm unable to fetch the packages. The issue is that once i'm logged in I get a ERR_CONNECTION_CLOSED/ERR_CONNECTION_RESET when fetching the packages
I've first tried to use the helm chart for verdaccio and simply change the the image to verdaccio-gitlab, this worked but I can't fetch the packages after login.
Expecting this to be possibly an issue with the helm chart, I've made my own deployment and also fetched the last copy from the repo to build a new image on top of verdaccio 4.0.0 beta10 instead of the beta3 version which is currently used for the latest tagged docker image. The symptoms are the same again.
Finally I've tried to simply deploy a verdaccio beta10 with just htpasswd and without the gitlab plugin and it looks to be working as it should. I might be missing obvious but I unfortunately can't see any errors in nginx ingress, gitlab or verdaccio