communitybridge / easycla

The Contributor License Agreement (CLA) service of the Linux Foundation lets project contributors read, sign, and submit contributor license agreements easily.
https://easycla.lfx.linuxfoundation.org
MIT License
64 stars 45 forks source link

Compare LDAP users in CLA groups with actual contributions #4451

Closed mlehotskylf closed 1 day ago

mlehotskylf commented 1 month ago

We do have some users in EasyCLA groups in LDAP which doesn't seem to be authorized by EasyCLA itself. That may be an issue because users in LDAP should ideally match with authorized users in EasyCLA. However, there is a chance that these users were added to LDAP CLA groups through different channels or they were not properly removed by older version of EasyCLA.

What needs to be done: We need to review these LDAP users in EasyCLA groups which are not authorized by EasyCLA itself if they ever contributed or if they are even contributing now.

List of LDAP users in EasyCLA groups which not authorized by EasyCLA:

We need to check if these users contributed (or are contributing) to Gerrit servers:

We need to know:

See #4348 for more details.

This is related to #4394.

lukaszgryglicki commented 1 month ago

On it as requested by @mlehotskylf

lukaszgryglicki commented 1 month ago

Unique users from those 4 files (to check):

('AlexandruAvadanii', 'BFrazer', 'ChristopherPrice', 'GaoSong', 'HWillson', 'HuabingZhao', 'InoueReo', 'Isaac.manuelraj', 'Itohan', 'IvanADAM', 'Jallolo', 'Katel34', 'KennyPaul', 'LucProvoost', 'MagnusB', 'Manamohan', 'MattDavis', 'MehreenKaleem', 'Nagendra90287', 'PANTHEON.tech', 'Pavithra', 'Pooja03', 'PremkumarAarna', 'Ray_NTUST', 'RehanRaza', 'SandeepAarna', 'SantoshB', 'SindhuXirasagar', 'SnehaD', 'SunilB', 'ThamlurRaju', 'TianL', 'VincentDanno', 'YCJict', 'YatianXU', 'YoonsoonJahng', 'a.czajkowski', 'adetalhouet', 'akanshaDua', 'akapadia', 'aleemraja', 'aleksandrtaranov', 'alexeyaleynikov', 'allison4nordix', 'alokbhatt', 'amitagh', 'andreasgeissler', 'anil1', 'ankitbhatt', 'aribeiro', 'arjunmgupta', 'atassi', 'babejmat', 'bdavis', 'bdfreeman1421', 'bhagyalakshmi', 'bhedstrom', 'brilldav', 'brindasanthm', 'chaitanyakadiyala', 'chsailakshmi', 'codechinatelecom', 'cramstad', 'cryptomaster', 'cschowdam', 'dacher', 'dafuse', 'daniesilamdocs', 'debbiemedina', 'demx8as6', 'deswali07', 'djhunt', 'ediaz101', 'enyinna1234', 'ezhil', 'fpaquett', 'francistoth', 'frank123', 'fujihiro16', 'fzhang', 'ggarudapw', 'gmittal', 'gordonkoocommscope', 'gregory.hayes', 'gseiler', 'guyjacobson', 'halil.cakal', 'hujie', 'hwcm', 'ilanap', 'jamesgu', 'jingjincs', 'jkbecker', 'jsulliva', 'kaihlavi', 'kamezawa', 'karbon', 'kavi2021', 'kbanka', 'kevin.brown1viavi', 'ktimoney', 'kwiatrox', 'liuwenyu', 'mabelgaumkar1', 'manoj1', 'marian.vaclavik', 'mbrunner', 'mdolan', 'melliott', 'mharper', 'michal', 'mit3301', 'mizunoami123', 'nandkumar', 'neelesh.durgapal', 'niharika.sharma', 'o-ran-sc-release', 'ojasdubey', 'onapbot', 'pau2882', 'pceicicd', 'pleigh', 'prabhjot', 'preethams', 'projitaarna', 'rajeevme', 'rajiv.v', 'ramagp', 'rannyh', 'ranvijays', 'ravi.setti', 'ravikanth.p', 'ray.ntust', 'rgadiyar1', 'rp5811', 'rsrinivas', 'rsriraman', 'sainiashok', 'sanchitap', 'sanjaymekhale', 'sblimkie', 'sdevaraj665', 'seshukm', 'shalomb', 'shangyuxiang', 'shaoqiu', 'sharathprakash', 'shormancorigine', 'shrek2000', 'singh.sunil', 'singhrishipratap', 'sitedata', 'sridharkn', 'ssteve', 'subhash_singh', 'sudhakar.ndc', 'sumitc29', 'sunqiong.bri', 'surajchalapathy', 'swaminathans', 'swapnalipode', 't.seshu', 'talig', 'talio', 'thakurveerendra', 'tperala', 'tragait', 'vamshi.nemalikonda', 'vharish', 'vikaskumar', 'vikram.barate.gslab', 'vivemuthu', 'vmuthukrishnan', 'vvarvate', 'wangy122', 'wanyama', 'ychacon', 'yingyingwang', 'yogendrapal', 'z00245565')
lukaszgryglicki commented 1 month ago

Now guessing starts: - is this "displayName" or username" or what?

lukaszgryglicki commented 1 month ago

I will assume this is LFID because this is the only one that is supposed to be unique, other types are related to a specific platform, so I cannot guess which platform it is...

lukaszgryglicki commented 1 month ago

There are 180 distinct names across those 4 links, but when I search for them in fivetran_ingest.crowd_prod_public.memberidentities by LFID I'm getting 95 hits, and when I'm checking by gerrit platform username I have 125 hits:

Zrzut ekranu 2024-10-20 o 07 28 04

When checking by member's displayname I have 81 hits and when checking by all possible username value regardless of platform I'm getting 155 hits. So nothing I can gives me all 180 users.

Zrzut ekranu 2024-10-20 o 07 30 20

I will use the one giving biggest number of hits and then generate result for those users (I will list what member ID it is and what username/platform hit)

lukaszgryglicki commented 1 month ago

I will provide report for all hits in any of those 180 usernames - for all platforms - you can later discard anything that is not needed (report will contain both username and platform and will be ordered by platform).

lukaszgryglicki commented 1 month ago

Report generated

Also attaching CSV here: easycla-gerrit.csv

Please read the Info: in the generated google spreadsheet, also pasting it here:

I've found any of those members by looking for "username" in memberidentities field for any platform
The exact criteria that foudn given member is in "Found via ..." column
Then we consider member ("Member Id" column) that any such identity belongs to
And for that member we are generating report for all its identities/platforms that has any activity
On requested gerrit servers, those are "Platfrom", "User Name" columns
Four each such identity we also group by "Activity type" (as requested) and display
first/last activity date, number of given type activities, and which gerrit servers identity contributed to 
We order this by platform, username, activity type and member Id
At the bottom there are rows where only "user Name" is set - those are rows
For which usernames I've found no hits in member identities - meaning those users didn't contributed anything (not only to given gerrit servers, but overall)

cc @mlehotskylf @dealako

nickmango commented 1 month ago

@lukaszgryglicki I had a question regarding the platforms. I see some platforms are tagged as git rather than gerrit . Would this make a difference ?

lukaszgryglicki commented 1 month ago

platform value comes from activities table in DBT/SnowFlake - I'm not touching it. Commits come from git, so platform for them in git not gerrit (gerrit gives patch-set and change-set related activities) . It is the same with Github - issues/PRs/reviews come from github platform while commits (as always) come from git platform.

Not sure how to answer to question:

Would this make a difference?

Difference to what? cc @mlehotskylf

nickmango commented 1 month ago

Noted . I asked because of the context of gerrit specifically

lukaszgryglicki commented 1 month ago

Working on version that only searches by gerrit username as requested by @mlehotskylf - I'll add this new version as a separate tab in the already exiting google sheet.

lukaszgryglicki commented 1 month ago

Report updated added sheets: Report by gerrit username only and SQL - by gerrit username only.

cc @mlehotskylf

lukaszgryglicki commented 1 month ago

Also uploading CSV here: easycla-gerrit-v2.csv

mlehotskylf commented 1 day ago

This is done and report looks good. Closing.