TSELab / guac-alytics

A series of tools and resources to better understand the risk profile of open source software ecosystems
Apache License 2.0
2 stars 0 forks source link

Website for the project #13

Closed SahithiKasim closed 1 year ago

SahithiKasim commented 1 year ago

Make a hugo markdown page to add it to the RCODI page.

SahithiKasim commented 1 year ago

Professor @sbrunswi and @SantiagoTorres, can you check the description below and give some input?

deepmind-LIlsk-UFVxk-unsplash

Introduction

Open source software is software whose source code is available to anyone to view, use, modify, and distribute. It is created and maintained by a community of developers and users collaborating to improve the software. Open-source packages are pre-built pieces of software that can be used to build other applications. They can be freely downloaded from repositories such as GitHub or npm, making them an essential part of software development.

However, like any other software, open-source packages can contain vulnerabilities that attackers can exploit. Vulnerabilities can be introduced in the code during development or added by third-party dependencies. These vulnerabilities can pose severe risks to software systems, and their impact can range from data breaches to system failures. Therefore, it is essential to manage and monitor open-source software and packages for vulnerabilities and take necessary measures to mitigate potential risks.

Motivation

The OSS supply chain has become an essential part of the software development process, enabling organizations to reduce development time and costs by reusing and integrating packages from different OSS products. However, this approach has also increased the risk of security vulnerabilities and software attacks compromising the confidentiality, integrity, and availability of sensitive data and system operations.

Our main theoretical contribution is evaluating the risks associated with OSS supply chain interdependencies. The risks associated with OSS components, such as the possibility of security vulnerabilities and non-compliance with open-source licenses, can be identified, assessed, and managed using these approaches. Utilizing software risk management approaches can give organizations a structured way to assess the risks related to OSS components and support them in making decisions. By adopting this structured approach, organizations can evaluate the potential risks of OSS components and make informed decisions regarding their use.

Approach

Using network science methods in software supply chain security is an innovative approach that can revolutionize how organizations detect and prevent software supply chain vulnerabilities. By leveraging the power of data analytics and visualization, we aim to provide organizations with a new toolset to enhance their security posture and protect their systems from sophisticated cyber threats. To achieve this goal, we are collecting and analyzing various representative supply chain datasets, which includes package popularity, build provenance, and maintainers of open-source ecosystems. This will provide valuable insights into the structural properties of these supply chains.

Using the acquired data, we construct various models to visualize and assess the risks associated with software supply chain susceptibilities. These models help identify potential weaknesses and interdependencies in the supply chain, enabling organizations to mitigate the risks proactively. In addition, using network science methods, we are trying to identify patterns and anomalies in the data that are difficult to detect using traditional security techniques.

Preliminary Results

Based on our initial findings, we have identified that kernel builds are the most interconnected in our dataset. Furthermore, we have observed that specific versions of these kernel builds are used more frequently than others. We were surprised to see older versions with higher connectivity and different processor architectures (e.g., MIPS) more connected than others, such as x86_64 or ARM.

Our analysis also suggests that based on the out-degree metric, these highly connected kernel builds pose a higher risk. Therefore, it will be vital for us to focus our attention on understanding these kernel builds in more detail and assessing any potential vulnerabilities or security risks associated with them. These preliminary results highlight the importance of conducting thorough analyses of interconnected systems to identify potential risks and inform effective risk management strategies.

Intellectual Merit and Broader Impact

The intellectual merit lies in our contribution to understanding and predicting the structure and risks within open-source software (OSS) supply chain ecosystems. It describes the social and technical interdependencies between complex systems and advances them in the OSS supply-chain context. Our research proposes a socio-technical network perspective to represent and analyze the complexity and inherent risks of the OSS supply chain. We leverage empirical data from OSS communities to construct temporal networks of OSS supply chains and use efficient algorithms for network mining to analyze them. We also propose to model and value the structural risks of a package within the OSS supply chain ecosystem to identify critical packages that create high risks for the repository and the ecosystem as a whole.

The broader impact of this research is mainly on software engineering and cyber security. Our research can help software engineers to identify critical packages that create high risks for the repository and the ecosystem. It can also help develop more effective security standards and best practices. Finally, it also helps to identify and mitigate the risks associated with using open-source software in cybersecurity.

SahithiKasim commented 1 year ago

deepmind-LIlsk-UFVxk-unsplash

Introduction

An open source is a software whose source code is available to anyone to view, use, modify, and distribute. It is created and maintained by a community of developers and users collaborating to improve the software. Open-source packages are pre-built pieces of software that can be used to build other applications. They can be freely downloaded from repositories such as GitHub or npm, making them an essential part of software development.

However, like any other software, open-source packages can contain vulnerabilities that attackers can exploit. Vulnerabilities are weaknesses or flaws in software or hardware systems that attackers can exploit to gain unauthorized access or control over the system. Vulnerabilities can be introduced during the development process or added by third-party dependencies. These vulnerabilities can pose severe risks to software systems, ranging from data breaches to system failures. Therefore, it is essential to manage and monitor open-source software and packages for vulnerabilities and take necessary measures to mitigate potential risks.

Motivation

The Open Source Software (OSS) supply chain has become an essential part of the software development process, enabling organizations to reduce development time and costs by reusing and integrating packages from different OSS products. However, this approach has also increased the risk of security vulnerabilities and software attacks compromising the confidentiality, integrity, and availability of sensitive data and system operations.

Our main theoretical contribution is evaluating the risks associated with OSS supply chain interdependencies. The risks associated with OSS components, such as the possibility of security vulnerabilities and non-compliance with open-source licenses, can be identified, assessed, and managed using these approaches. Utilizing software risk management approaches can give organizations a structured way to assess the risks related to OSS components and support them in making decisions.

Approach

Using network science methods in software supply chain security is an innovative approach that can revolutionize how organizations detect and prevent software supply chain vulnerabilities. By leveraging the power of data analytics and visualization, we aim to provide organizations with a new toolset to enhance their security posture and protect their systems from sophisticated cyber threats. To achieve this goal, we are collecting and analyzing various representative supply chain datasets, which includes package popularity, build provenance, and maintainers of open-source ecosystems. This will provide valuable insights into the structural properties of these supply chains.

Using the acquired data, we construct various models to visualize and assess the risks associated with software supply chain susceptibilities. These models help identify potential weaknesses and interdependencies in the supply chain, enabling organizations to mitigate the risks proactively. In addition, using network science methods, we are trying to identify patterns and anomalies in the data that are difficult to detect using traditional security techniques.

Preliminary Results

s1 Based on our initial findings, we have identified that kernel builds are the most interconnected in our dataset. Furthermore, we have observed that specific versions of these kernel builds are used more frequently than others. We were surprised to see older versions with higher connectivity and different processor architectures (e.g., MIPS) more connected than others, such as x86_64 or ARM.

Our analysis also suggests that based on the out-degree metric, these highly connected kernel builds pose a higher risk. Therefore, it will be vital for us to focus our attention on understanding these kernel builds in more detail and assessing any potential vulnerabilities or security risks associated with them. These preliminary results highlight the importance of conducting thorough analyses of interconnected systems to identify potential risks and inform effective risk management strategies.

Intellectual Merit and Broader Impact

The intellectual merit lies in our contribution to understanding and predicting the structure and risks within open-source software (OSS) supply chain ecosystems. It describes the social and technical interdependencies between complex systems and advances them in the OSS supply-chain context. Our research proposes a socio-technical network perspective to represent and analyze the complexity and inherent risks of the OSS supply chain. We leverage empirical data from OSS communities to construct temporal networks of OSS supply chains and use efficient algorithms for network mining to analyze them. We also propose to model and value the structural risks of a package within the OSS supply chain ecosystem to identify critical packages that create high risks for the repository and the ecosystem as a whole.

The broader impact of this research is mainly on software engineering and cyber security. Our research can help software engineers to identify critical packages that create high risks for the repository and the ecosystem. It can also help develop more effective security standards and best practices. Finally, it also helps to identify and mitigate the risks associated with using open-source software in cybersecurity.

sbrunswi commented 1 year ago

This looks better but the left side naming confuses the normal user - can we make this more readible for non Linux experts? Also - why dont you link to the actual markdown here so that i can edit the markdown file directly?

sbrunswi commented 1 year ago

Also - what is the scientific merit that we are driving? Currently you are just talking about the benefits for organizations. This page should also have citations and references. So can you add?

SahithiKasim commented 1 year ago

Link to markdown file - https://github.com/RCODI/rcodi-blog/blob/master/content/project/guac-alytics/index.md

Motivation

The Open Source Software (OSS) supply chain has become an essential part of the software development process, enabling organizations to reduce development time and costs by reusing and integrating packages from different OSS products. However, this approach has also increased the risk of security vulnerabilities and software attacks compromising the confidentiality, integrity, and availability of sensitive data and system operations. In addition, because OSS is often developed and maintained by a large community of contributors, it is challenging to identify and patch vulnerabilities promptly and effectively. Moreover, OSS often relies on third-party libraries and components, which can introduce additional vulnerabilities. Therefore, it is critical to understand the interdependencies and potential vulnerabilities introduced by such dependencies to ensure the overall security of the software.

Our main theoretical contribution is evaluating the risks associated with OSS supply chain interdependencies. The risks associated with OSS components, such as the possibility of security vulnerabilities and non-compliance with open-source licenses, can be identified, assessed, and managed using these approaches. Utilizing software risk management approaches can give organizations a structured way to assess the risks related to OSS components and support them in making decisions. It also helps to track the vulnerabilities through all the related dependencies so that developers can identify and patch them quickly across different platforms.

Preliminary Results

Based on our initial findings, we have identified that kernel builds are the most interconnected in our dataset. Furthermore, we have observed that specific versions of these kernel builds are used more frequently than others. We were surprised to see older versions with higher connectivity and different processor architectures (e.g., MIPS - processor of routers, set-up boxes) more connected than others, such as x86_64 (architecture of Intel's 64-bit CPUs) or ARM (architecture of Apple's chip).

Our analysis also suggests that based on the out-degree metric, these highly connected kernel builds pose a higher risk. Therefore, it will be vital for us to focus our attention on understanding these kernel builds in more detail and assessing any potential vulnerabilities or security risks associated with them. These preliminary results highlight the importance of conducting thorough analyses of interconnected systems to identify potential risks and inform effective risk management strategies.

References and Citations

SahithiKasim commented 1 year ago

@sbrunswi and @SantiagoTorres can you check this and add comments to it?

Future Work

Given the prevalence of supply chain attacks has been on the rise, it is crucial to develop better tools and techniques to mitigate security risks in the OSS supply chain. Our proposed method and existing solutions have made significant progress, but the management of dependencies remains a major issue in the OSS supply chain. The complexity of dependencies in a typical OSS project makes it challenging to track and manage them.

To overcome this challenge, our future work will focus on developing new approaches to manage dependencies more effectively. One potential solution is to use core-periphery techniques, such as rossa (Della Rossa et al., 2013), rombach (Rombach et al., 2014), and minres (Boyd et al., 2010), and concentrate more on the hidden core structures (Baldwin et al., 2014) of the dependency graphs which provide a more comprehensive view of the dependencies. By visualizing the dependencies in a more meaningful way, these graphs can help identify potential vulnerabilities in OSS components more quickly and accurately.

By developing these new approaches, we hope to create a more secure OSS supply chain. This will enable developers to detect and mitigate security risks more efficiently and effectively, leading to safer and more trustworthy OSS projects. Ultimately, our goal is to help build a more resilient OSS community that can withstand the growing threat of supply chain attacks.

SahithiKasim commented 1 year ago

We have the final version here - https://rcodi.org/project/guac-alytics/