PQCA / TAC

https://pqca.org
Apache License 2.0
15 stars 4 forks source link

Onboard PQCA projects to LFX Insights #31

Open planetf1 opened 1 month ago

planetf1 commented 1 month ago

I'd like to propose we onboard our PQCA projects to Linux Foundation's Insights project - https://lfx.linuxfoundation.org/tools/insights/

This can gather metrics around activity, contribution diversity, and capture trends over time.

As we start capturing data we can discuss findings within the sub-projects and/or at the tac and see if there are actions we wish to take.

ryjones commented 1 month ago

This is in progress

maximilien commented 1 month ago

Is this done yet? Maybe as part of the landscape work? Or different?

ryjones commented 1 month ago

here it is

baentsch commented 1 month ago

Cute. Thanks for setting it up! A bit disappointing that the tool only makes org&location display decisions based on email addresses, not GH (location/address) info. Of course, if it did, US and corp contributions would be shown as low as they really are -- and LF surely wouldn't want that to happen :-) Hints: UWaterloo is not in the US; neither am I.

Also sub-optimal that one can only change the timescales with an LF account, say to dissect the "before and after LF take-over" effect -- but that's probably also a thing LF wouldn't really want to make publicly visible via its own tooling.

Then, @SWilson4: Time to ask for a pay rise looking at these graphs :-) Related, a comment to @hartm: Looking at these charts I now completely understand what you meant by saying people "game" this stuff...

ryjones commented 1 month ago

The email versus GitHub thing is a long standing pain point. If you look here, you'll see denyeart is the top person, and the fourth. The logo next to them shows the data source.

Screenshot 2024-07-31 at 10 35 18
ryjones commented 1 month ago

It looks like peak commits were in 2019:

Screenshot 2024-07-31 at 11 36 00 Screenshot 2024-07-31 at 11 36 32 Screenshot 2024-07-31 at 11 36 41
hartm commented 1 month ago

Of course, if it did, US and corp contributions would be shown as low as they really are -- and LF surely wouldn't want that to happen :-)

We at the LF (and the corporations) actually do want this information! At the LF, it enables us to rigorously track analytics. For companies, it enables them to track their contributions relative to other comparable companies, and even do this in a cross-project and cross-foundation way. Corporate execs also like to say "we contributed XYZ lines of code to open source" which they can do with these numbers.

If you're reading this, and comfortable providing your employer information and email, sign up for an LF ID and link your email and github! This also lets us contact maintainers more easily when we need to do so.

Also sub-optimal that one can only change the timescales with an LF account, say to dissect the "before and after LF take-over" effect -- but that's probably also a thing LF wouldn't really want to make publicly visible via its own tooling.

We believe in radical transparency. While we probably can't get the LFX Insights team to literally draw a line on all of our graphs, this data is available for anyone who cares to look. Getting an LF account is free and very easy. We also obviously respect GDPR so you can request to be "forgotten" at any time.

baentsch commented 1 month ago

sign up for an LF ID and link your email and github! [...

@hartm I said before and keep repeating that LinuxFoundation's t's and c's for the LFID IMO are not exactly supportive of free speech and openness:

Linux Foundation reserves the right to take down anything you post on this site, for these or other reasons. If you don’t agree to these terms, don’t use this website.

I have no problem agreeing that all reasons explicitly stated ("these") warrant removal of content (and myself would surely not post "such stuff" nor want to be exposed to it). But what is not acceptable to me is the wording "other reasons": With this completely unconditional statement, you guys reserve the right to remove any content you simply don't like. And that is not free speech. Hence, I follow your own advice ("don’t use this").

Technically, allow me to ask what's stopping you from scouring GH for the information you want to collect separately via LFID? By way of example for myself: "Employer", location, GH org affiliation: all accessible -- and I presume not just by screen scraping (?) image

Or does Microsoft (GH) legally or technically prohibit you from using GH APIs to retrieve this information and address what seems to be a "long standing pain point" in @ryjones words?

We also obviously respect GDPR so you can request to be "forgotten" at any time.

Good to hear you're following the law :). But that alone makes me wonder: What's LF doing if someone wants to be forgotten? Also delete all the comments of that person? All cross-references to that person (and her comments)? Yuck. Better not collect the information in the first place :-)

baentsch commented 1 month ago

Oh, and one more question @hartm: You write

Corporate execs also like to say "we contributed XYZ lines of code to open source" which they can do with these numbers.

Can I also do this (for myself)? Or does this need an LFID? So far I only found (very high level and easily gamed) "contribution" counts, but no concrete line numbers (both contributed in PRs as well as "still active", or "owned" as per GH blame): Can this be displayed?

baentsch commented 1 month ago

Sorry, @hartm, this statement of yours IMO also warrants a second glance:

We believe in radical transparency. While we probably can't get the LFX Insights team to literally draw a line on all of our graphs, this data is available for anyone who cares to look.

I now looked .. and only superficially. Already at that level things don't all add up: You seem to imply all data is reported, just not all lines are drawn. Is this really so? Isn't it so that you(r team) definitely exclude(s) data, namely contributions by organizations you don't recognize -- and thus making the stats misleading as you(r team) then normalize(s) all data on that omission?

Here's an example: image

One guy doing 122 contributions and one organization doing 32 "thus" being credited with 97% of all contributions sounds a bit statistically dubious, no?

Using this and your statement

Corporate execs also like to say "we contributed XYZ lines of code to open source" which they can do with these numbers.

IBM execs can currently claim they contributed "3%" to this project (silly as it is as contribution count has only a very rough relation to effective line number count); and UWaterloo is credited with 97%. Hmm.

Isn't it so that the complete "organizational view" is based on 3 contributors whose affiliations LF recognizes? The contributions of 31 other people whose affiliations you(r team) didn't glean are totally disregarded. LF(X Insights) thus reports (and draws lines) in this case on less than 10% of the contributors. Is it right to call this "radical transparency"? Might the term "biased" (towards LF organizations) not be more appropriate?

Don't get me wrong: Those (3) guys' (companies) pay your bills as LF members so I understand LF(X Insights) represents them. But it's 34 people in this example providing the code in the project. And that's what FOSS is primarily about, no?

So here's a concrete proposal should LF be interested to represent more than the (in this example 9%) LF OSS contributors in LFX insights: Please consider documenting at the very least "Unknown origin organization" (I'd even label it "LF-independent contributors") contributions.

This would improve the perspective and get a bit closer to reality, recognize non-LF contributors and give your execs a more accurate perspective on the effective relevance of their contributions (if they want it...) and health of the project.

Even better would be to try to (best, automatically) glean organizational affiliation from GH. It seems possible.

All of this would help make visible the work of people voluntarily contributing and not as part of a specific LF member's employment obligation -- and maybe make those (91% of all contributors in this case) folks' voices more relevant to LF decisions.

But then again, this tool is called "LFX insights" and not "OSS insights", so no worries if you(r team) have more important things to do for your customers. In such case, though, I'd advise caution in using this tool to make externally (to LF) visible "lifecycle" decisions about OSS projects -- which is what triggered this issue in the first place.

ryjones commented 1 month ago

@baentsch I collect the raw data for my own use from a public source. I don't do any affiliation, of course. I'm interested in any insights you can glean. Another project I work on does something similar. I have a copy of gharchive, which is:

% du -sh raw uncompressed 
3.5T    raw
6.6T    uncompressed
hartm commented 1 month ago

@baentsch There is no conspiracy here against unaffiliated contributors. You are correct that the affiliation data being shown is buggy, so I've reported that to the LFX Insights team. I wouldn't expect a response until Monday though.

baentsch commented 1 month ago

Thanks @hartm for accepting and forwarding this as a bug report. Please post the corresponding GH issue link here so I can track and possibly help the people working on this --at least confirming a fix as any bug reporter should do.

Your comment regarding "conspiracy" feels unnecessary and creates some uneasiness on my side: I sincerely hope you don't want to imply I'm crazy? Just so you can check, one more data point: I think Elvis is dead :-)

More seriously: What I did (try to do) above is explain the consequences of this error and possible reasons for this facility to not have had priority for apparently ever having been tested for baseline correct operations (business priority recognizing LF over non-LF contributions).

If your "no conspiracy" statement in turn is meant to say "LFX Insights never has been designed or tested to deliver organizational decision support data with regard to contributors not affiliated with LF/an organization supporting/funding LF", then we're in agreement.

baentsch commented 1 month ago

Thanks for sharing your project links, @ryjones . Unfortunately I'm afraid I'm out of my depth working with this data (at least I didn't immediately understand what EventFlow does), so most likely have to disappoint you wrt

I'm interested in any insights you can glean.

baentsch commented 1 month ago

There is no conspiracy here against unaffiliated contributors.

@hartm (and any one else reading) would it be conceivable you/LF label us "independent contributors" rather than as "unaffiliated"? The latter sounds like we're incomplete people without affiliation that one can keep disregarding, e.g., when asking for help -- even though we're apparently 91% of all contributors in the project we use as the example above.

The term "independent" in turn stresses that we bring at least diversity and self-motivation to the projects that LF controls -- without costing you a dime.

I know it's just a word but it might entice LF(-affiliated) contributors to think of independent contributors a bit more highly and maybe help us a bit more, e.g., finding ways to navigate the limitations that LF contributors accepted as part of their job&salary but that are new, challenging and costly (in terms of our volunteered time) to us. Thanks.

hartm commented 1 month ago

Thanks @hartm for accepting and forwarding this as a bug report. Please post the corresponding GH issue link here so I can track and possibly help the people working on this --at least confirming a fix as any bug reporter should do.

Someone from the LFX team is going to comment in this thread on Monday. They need to check and confirm the data.

Your comment regarding "conspiracy" feels unnecessary and creates some uneasiness on my side: I sincerely hope you don't want to imply I'm crazy? Just so you can check, one more data point: I think Elvis is dead :-)

No, I'm not implying anything about you with that statement.

If your "no conspiracy" statement in turn is meant to say "LFX Insights never has been designed or tested to deliver organizational decision support data with regard to contributors not affiliated with LF/an organization supporting/funding LF", then we're in agreement.

It has! But, in my understanding, PQCA data hasn't been onboarded to the point where this works yet, so hopefully we will be getting it working soon. The issue is certainly in this area.

@hartm (and any one else reading) would it be conceivable you/LF label us "independent contributors" rather than as "unaffiliated"? The latter sounds like we're incomplete people without affiliation that one can keep disregarding, e.g., when asking for help -- even though we're apparently 91% of all contributors in the project we use as the example above.

It probably would be! When someone from LFX posts here tomorrow, I'd encourage you to ask them directly!

planetf1 commented 1 month ago

@ryjones I notice you assigned this to me (I am interested in it). Is there any action you expect from me at the moment?

ryjones commented 1 month ago

@planetf1 no - you opened it, so I was thinking you would close it (eventually)

baentsch commented 2 weeks ago

would it be conceivable you/LF label us "independent contributors"

It probably would be! When someone from LFX posts here tomorrow, I'd encourage you to ask them directly!

Will do as/if/when "tomorrow" comes. Until then, it'd be nice if you would already consider us in that way, @hartm

mcderk commented 2 weeks ago

Hi @baentsch - thank you for your thoughts. I am the LFX Head of Product & Design. Happy to answer questions or receive feedback on the product. I will take a look personally at the affiliations for this project and see if we can take another pass to improve the data. I'm open to get on a meeting or do this asynchronously - whatever is best for you. Appreciate you taking the time to review the data and share your perspectives.

baentsch commented 2 weeks ago

Hi @mcderk - thanks for the initial feedback. Please let me know if anything in the above is unclear or you have questions. Also, please note that not just the affiliation(s) but also all geo information is completely "off". If you have a GH project developing the code for LFX, please feel free to provide a pointer and reference me in issues there to move this discussion to where it arguably belongs.

Meetings are generally sub optimal in my eyes as a) we're in different time zones; b) my command of the spoken language is most likely no match to yours; c) lots of detail is getting lost when only discussed in meetings; and d) there's typically many more perspectives/solution proposals coming forward when discussing problems openly with the community in suitable issues.

planetf1 commented 2 weeks ago

A question on affiliation. I notice the Organization Leaderboard contains 'Egeria Project'

This is probably me -- When I worked in LF AI & Data on Egeria (up until June 2023) we created a LinkedIn org - in part to make it easier to write linked in blogs. In any case in linked in that org membership ended in June.

So I'm intrigued as to why it shows up here? My LF profile only shows IBM (long form)? Is there some other profile information that needs correcting?

I'd also agree on the geographical distribution point -- I think this is particular interesting, but I don't see any Europe except for Turkey. Downloading the CSV doesn't give any more insight - and given the table is labelled 'top 5' it implies this are the only 3 countries represented at all, which seems very wrong (as pointed out above in relation to Canada which has a massive contribution)

planetf1 commented 2 weeks ago

Another feature that would be useful is an aggregate view across the 3 (will be more in future perhaps) orgs

planetf1 commented 2 weeks ago

Github also maintains a mapping of email->github id. I noticed that in the pq-code-package project:

Screenshot 2024-08-21 at 10 51 15

These are all me. Is there a way I can link them? Both of the emails (vs github ids) are recorded in my LFX dashboard, so lfx should 'know' they are me.

baentsch commented 2 weeks ago

are recorded in my LFX dashboard, so lfx should 'know' they are me.

Again, (particularly @mcderk ), please consider using only GH information; the more you rely on LF-internal data structures/registrations, the less interesting (and relevant) this UI becomes for independent contributors; either way, this may be intentional for your product and you might also want to weigh in whether my assumption stated above is right after all:

"LFX Insights never has been designed or tested to deliver organizational decision support data with regard to contributors not affiliated with LF/an organization supporting/funding LF"

Anyway, in the case of @planetf1, the GH org of https://github.com/planetf1 is clearly "IBM", nothing else. So where does the UI get any other affiliation information from?

baentsch commented 1 week ago

Thanks @mcderk ('s team :) for some improvements to the UI. But I'd like to urge you to not attribute me personally as "Freelance Consultant" but either use my GH org ID -- or better, bundle me and all other independent contributors under exactly that heading ("Independent Contributors" -- see @hartm 's comment above) -- to make it visible to which degree voluntary work (used to) drive this project.

mcderk commented 1 week ago

@baentsch I will look to change yours to 'Independent Contributor' individually and then create a ticket for our team to investigate how to best show voluntary work. Thank you for the feedback

baentsch commented 1 week ago

@baentsch I will look to change yours to 'Independent Contributor' individually and then create a ticket for our team to investigate how to best show voluntary work. Thank you for the feedback

Thanks @mcderk . This further confirms my impression that LF is not really considerate of voluntary work(ers). Please correct me if I'm wrong (ideally pointing to some Independent Contributors community for me to "learn the ropes" on this topic before I hang myself in the invisible ones...).

mcderk commented 2 days ago

Hi @baentsch, we do consider voluntary work and are working to make it easier for our users to self-manage it.

The main issue with identifying individual contributors is that there is no standard and common way that information is shared so it's really difficult to figure out who is contributing for themselves individually, on behalf of a company, or sometimes for each depending on the project, and how those affiliations shift especially when there is significant overlap.

I can confirm we are doing our best up to this point and have features (self-service affiliation & attribution within the LF profile) targeted for delivery in the next quarter (Q4 2024) or two (Q1 2025) - however it is a complex and large problem to solve. We definitely have room for improvement and are continuing to prioritize individual attribution against other items at LF / LFX.

If you have thoughts on how to do this better I'd be open to them as I have seen great ideas come from all sorts of places and people - usually around an issue they are passionate about for themselves. Thank you for your thoughts & feedback - I appreciate it.

baentsch commented 1 day ago

If you have thoughts on how to do this better I'd be open to them as I have seen great ideas come from all sorts of places and people - usually around an issue they are passionate about for themselves.

That's precisely the benefit of FOSS. I honestly think you're doing yourself a big disservice by not using this thought more generally: Why for example are you not opening up the work on this to others by placing the code for it on GH and discussing things like this openly in a community? That way everyone could understand (and/or help solve) things you deem "really difficult". A pointer to this community was all I asked for above -- by not providing it, you caused the discussion on a completely different topic (onboarding a specific project) to get hijacked (apologies to all except @mcderk ).

It's pretty weird that the organization that touts to support OSS does not seem to be using it itself. This seems to further bolster my criticism towards your organization: My impression is that LF is not really about FOSS, but about closing down/treating as "your own" all software LF manages.

Back to the issue: What's wrong (or difficult to implement) with treating every GH handle without any or without an LF-registered organizational affiliation (as publicly posted on GH; see above) as belonging to the "Individual Contributor" tribe?

Same thing as with country affiliations: You seem to look at published email address only but not at the country information publicly posted in GH. Why?

I do see that your approach is more simple to implement, but it is completely misleading -- surely in the case of this project (where the vast majority of work is done in Switzerland and Canada, but only "US" .com addresses --including anonymous GH ones!-- get noticed). This conceptual error most likely is the case for all LF projects, so you guys this way conceptually belittle/disenfranchise/discriminate (chose the most fitting term) all work done in other geos. Again, this may be desired strategically by the US-based LF to bolster its self-esteem but it does this by alienating the (rest of the) world's contributors.