chaoss / wg-data-science

CHAOSS Data Science Working Group: collaborate and improve open source project health using data science-based approaches
MIT License
7 stars 4 forks source link

[Project]: License Changes and Forking #47

Open geekygirldawn opened 2 months ago

geekygirldawn commented 2 months ago

Project Name (1 - 3 words)

License Changes and Forking

Description

License Change: Can we predict the likelihood of a license change for an open source project from an open source license to a non-open source or more restrictive license?

Forking: What is the impact of a fork on the health of the original project and how does this compare with health of the fork?

An unexpected change to a non-open-source license or more restrictive open source license that limits how the project can be used to exclude how it is being used in recent past by 3rd parties.

I think we also need to build and maintain a better dataset for license changes unless someone can find one that someone else already maintains? For now, this seems to be the most comprehensive list: https://en.wikipedia.org/wiki/List_of_formerly_open-source_or_free_software

I also think that we need to consider that this is a rare event when we start to model this: https://en.wikipedia.org/wiki/Rare_events

This is often related to Elephant Factor, which refers to too much control by a single company. These negative events in a project’s life that are strongly associated with too much single company control, meaning they are different than what occurs in a project with a diverse community of contributors and maintainers.

Note: We've combined these 2 ideas into one project because license changes are a common reason for project forking.

Related Links

No response

Note that we also have a Project Scope Template doc that you can use to think about the project details if you find it useful (not required).

How would you like to be involved in this project?

I am interested in this project, but do not plan to work on it myself

Additional Notes.

No response

geekygirldawn commented 2 months ago

I've created a dataset to get us started here: https://github.com/chaoss/wg-data-science/tree/main/dataset/license-changes It still needs some help with cleaning it up and making it more complete.

geekygirldawn commented 2 months ago

@gkunz suggested in Slack:

it could be very interesting (at least for me) to also include available open source forks in your data set and then take a look at how they are evolving

The forks would definitely be something we should take a look at. I'm trying to keep the initial dataset as small as possible with the idea that people can build additional datasets depending on what questions they are trying to answer, but this feels like something that might be common enough to add, especially now that a lot more license changes are resulting in viable forks. However, a csv file doesn't work well when there are multiple forks, like there were with Redis, so we might need a JSON file for that, so maybe build a separate, but related dataset would be better?

I'm creating this issue to avoid losing this idea and get feedback from others before we decide.

gkunz commented 2 months ago

Thank you! I agree that csv is too limited for this purpose.

geekygirldawn commented 2 months ago

What additional context can we add? How many people were impacted? What type of org “owns” the project / business? - would be pretty manual, but some of these variables will be important. How do we collect some of this contextual data and get people to contribute it in a way that we can code it / hierarchy / category? Primary sources? Can we use an LLM? Maybe dependencies in deps.dev would be helpful (Sophia knows more about how to do this).

Look for scientific publications that have looked at this.

geekygirldawn commented 1 week ago

I thought about it a bit more and started work on a csv with data about forks. It's not even close to done, but I wanted to drop the link here in case people have feedback: https://github.com/chaoss/wg-data-science/blob/main/dataset/license-changes/forks.csv