epam / OSCI

Open Source Contributor Index
https://opensourceindex.io/
GNU General Public License v3.0
161 stars 99 forks source link

Indexing Subsidiaries and their email domains #79

Open devashish-gaikwad opened 3 years ago

devashish-gaikwad commented 3 years ago

Hello,

What is the policy for addition of subsidiaries to the OSCI index ? Is it a decision of parent company or of EPAM/OSCI on how to list the email domains ?

Should all subsidiaries of a company be under same umbrella (and index calculation) of parent company OR For each subsidiary with a different email domain, a new company addition (and index calculation) with a subsidiary name should be made ?

For example, a company "X" has 2 subsidiaries "Y" and "Z"

Option A:

company: X
  domains:
    - X.com
    - Y.com
    - Z.com

Option B:

company: X
  domains:
    - X.com

company: Y
  domains:
    - Y.com

company: Z
  domains:
    - Z.com

Which would be acceptable ?

abitrolly commented 3 years ago

Having more fine grained approach would be interesting in exploring the open source policies between subsidiaries and their parent companies.

devashish-gaikwad commented 3 years ago

Yes I agree, perhaps my issue sounds like a feature request but it is actually a Question

cm-howard commented 3 years ago

Thanks for the conversation so far - this is certainly something we've been discussing as a team too so it's great to hear it being raised from the community.

As a topic, we don't want to really prescribe how Organisations should define themselves within the rankings so instead we're hoping that those responsible for such decisions take a mature approach that's in the spirit of what we're attempting to measure and doesn't simply allow for large groups of companies to dominate the rankings.

Instead, including smaller subsidiaries in your parent grouping (Option A) makes sense when those companies still work closely together on their Open Source engagements. However, at the same time we'd hope an organisation such as Google wouldn't go ahead and list Youtube and Nest as child domains and instead split them out as their own entities to reflect their different approaches to Open Source contribution.

What do you think? I'm keen to hear more conversation on this...

abitrolly commented 3 years ago

YouTube and Google Nest are subsidiaries of Alphabet Inc.. I don't think that this holding company commits to open source or cares about it.

It is harder to split aggregates than aggregate pieces. In the end "make it optional" always works.

devashish-gaikwad commented 3 years ago

@cm-howard I agree with your observation that Option A makes more sense. Organisations and their subsidiaries whose IT operations are closely tied together should be represented as one.

devashish-gaikwad commented 3 years ago

Should I add summary of this discussion to the README.md ? Something of the sort - "Currently the decision of listing the subsidiaries of an organisation under a single entry or multiple entries is left to the organisation in good faith" under this heading

I am just starting in OSS contributions, this might be a good kick-start for me You are welcome to assign me any beginner friendly issues/requests 😄

cm-howard commented 3 years ago

Thanks for your comments @devashish-gaikwad and for the suggestion of updating the README too.

We're going to have a look at this as a team and see how we can reflect the discussion appropriately. In the meantime we'll be sure to suggest any potential issues that would be a good starting point. It's great to see your interest in the solution.

abitrolly commented 3 years ago

I would be interested to see the dataset of companies and their subsidiaries on specific date. Not sure if the dataset should be collected at this repo, but if there is no such source.

cm-howard commented 3 years ago

That's great to hear @abitrolly - we're actually working on a whole range of improvements for the index at the moment of which one is a flexible date range selection.

I like the idea of an awareness of supporting subsidiaries too for each Organization so we can think about how we might include this as a potential feature. @vlad-isayko @Uliana2019

abitrolly commented 3 years ago

To take the idea a bit further, it is also interesting to see when open source commits that are made by a company were contributed as a paid contract with another company. In that case the real contributor could be the company who paid.

patrickstephens2 commented 3 years ago

To the original question.. When we originally defined the list of companies for OSCI and domain to company mapping, we looked at the domains we saw occurring in the data and mapped these to companies to the best of our abilities. We did lots of googling, discovering where domains were subsidiaries of other companies (e.g. egencia.com a subsidiary of Expedia Group). We excluded freemail addresses - which is easy for well-known US providers, but more work for those around the world. And so on.

Subsequently, more than once, this was reviewed and extended.

We used some rules of thumb along the way. Example 1: if we became aware that large company acquired a small company, we would typically roll the domains of the small company under the big one. Example 2: if a company acquired a company which was a major and well-known player in the open source world, and it looked like that acquired company would continue to operate as a relatively independent entity, we would tend to keep them as separate companies.

At the end of the day, this was "best effort" and may have errors and omissions. As an open source project we wanted to encourage companies to submit pull requests with their own "additions and corrections". I've seen some of these, which is great. IMO this is the way to go forward rather than have any hard-rule. But guidelines would be useful.