Demonstrator data-mining backend for an open source development status dashboard
Targeted at hosters of version control platforms (such as Wikifactory, GitLab, or GitHub), this Python backend program mines open source hardware repositories for metadata and calculates metrics based on it. This backend exposes a representational state transfer (REST) application programming interface (API) where requests for those metrics can be made.
This software is not for general consumers to just "double click" on and install on their devices.
Please see the Install and Usage sections to get up and running with this tool.
Today’s industrial product creation is expensive, risky and unsustainable. At the same time, the process is highly inaccessible to consumers who have very little input in the design and distribution of the finished product. Presently, SMEs and maker communities across Europe are coming together to fundamentally change the way we create, produce, and distribute products.
OPENNEXT is a collaboration between 19 industry and academic partners across Europe. Funded by the European Union's Horizon 2020 programme, this project seeks to enable small and medium enterprises (SMEs) to work with consumers, makers, and other communities in rethinking how products are designed and produced. Open source hardware is a key enabler of this goal where the design of a physical product is released with the freedoms for anyone to study, modify, share, and redistribute copies. These essential freedoms are based on those of open source software, which is itself derived from free software where the word free refers to freedom, not free-of-charge. When put in practice, these freedoms could potentially not only reduce proprietary vendor lock-in, planned obsolescence, or waste but also stimulate novel – even disruptive – business models. The SME partners in OPENNEXT are experimenting with producing open source hardware and even opening up the development process to wider community participation. They produce diverse products ranging from desks, cargo bike modules, to a digital scientific instrument platform (and more).
Work package 2 (WP2) of OPENNEXT is gathering theoretical and practical insights on best practices for company-community collaboration when developing open source hardware. This includes running Delphi studies to develop a maturity model to describe the collaboration and developing a precise definition for what the "source" is in open source hardware. In particular, task 2.2 in this work package is developing a demonstration project status dashboard with "health" indicators showing the evolution of a project within the maturity model; design activities; or progress towards success based on project goals. Details of the dashboard's technical architecture are described in the deliverable 2.5 (D2.5) report.
This repository contains the backend code for D2.5 and to be clear, this deliverable is: Designed to be deployed on a server operated by version control platforms such as Wikifactory or GitHub.
This deliverable is not: For general end-users to install on consumer devices and "double click" to open.
In addition, this repository aims to follow international standards and good practices in open source development such as, but not limited to:
LICENSES
directorymain
instead of master
following modern best practicesThis section assumes knowledge of Python, Git, and using a GNU/Linux-based server including installing software from package managers and running a terminal session.
Note: This software is designed to be deployed on a server by system administrators or developers, not on generic consumer devices.
This project requires Python version 3.10 or later on your server and running it in a Python virtual environment is optional but recommended. Detailed external library dependencies are listed in the standard-conformant requirements.txt
file and also here:
In addition to Python and the dependencies listed above, the following programs must be installed and accessible from the command line:
A GitHub personal access token is required top be available as an environmental variable. This is because the Python scripts will use it for GitHub API queries. This token is an alphanumeric string in the form of "ghp_2D5TYFikFsQ4U9KPfzHyvigMycePCPqkPgWc".
The code can be run from source and has been tested on updated versions of GNU/Linux server operating systems including Red Hat Enterprise Linux 8.7. While effort has been made to keep the Python scripts platform-agnostic, they have not been tested under other operating systems such as BSD-derivatives, Apple macOS or Microsoft Windows as they - especially the latter two - are rarely used for hosting code such as this.
On your server, with the tools git
and pip
installed, run the following commands in a terminal session to retrieve the latest version of this repository and prepare it for development and running locally (usually for testing):
git clone https://github.com/OPEN-NEXT/wp2.2_dev.git
pip install --user -r requirements.txt
The git
command will download the files in this repository onto your server into a directory named wp2.2_dev
, and pip
installs the Python dependencies listed in requirements.txt
.
In a terminal window at the root directory of this repository, start the server with the uvicorn
Asynchronous Server Gateway Interface (ASGI) server by running this command:
uvicorn oshminer.main:app --reload
There will be some commandline output which ends with something like the following line:
INFO: Application startup complete.
This means the server API is up an running, and should be accessible on your local machine on port 8000 at 127.0.0.1.
There is a Dockerfile
in this repository that defines a container within which this code can run.
To build and use the container, you need to have programs like Podman or Docker installed.
With the repository cloned by git
onto your system, navigate to it and build the container with this command:
podman build -t wp22dev ./ --format=docker
Replace the command podman
with docker
depending on which one is available (this project has been tested with Podman 4.0.2), and wp22dev
can be replaced with any other name. --format=docker
is needed to explicitly build this as a Docker-formatted container that will be accepted by cloud services like Heroku.
Then, the run the container on port 8000 at 127.0.0.1 with this command:
podman run --env PORT=8000 --env GITHUB_TOKEN=[token] -p 127.0.0.1:8000:8000 -d wp22dev
Where token
is the 40 character alphanumeric string of your GitHub API personal access token. It is in the form of "ghp_2D5TYFikFsQ4U9KPfzHyvigMycePCPqkPgWc".
The image built this way can be pushed to cloud hosting providers such as Heroku. With Heroku as an example:
Set up an empty app from your Heroku dashboard.
In the Settings page for your Heroku app, set a Config Var with Key "GITHUB_TOKEN" and Value being your GitHub API personal access token.
With the Heroku commandline interface installed, first login from your terminal:
heroku container:login
podman push wp22dev registry.heroku.com/[your app name]/web
heroku container:release web --app=[your app name]
Similar to Heroku, the container image created above can be deployed to an app on Fly.io. Assuming a Fly.io account has already been created:
flyctl auth login
[your app name]
, replacing it with whatever name you'd like: flyctl launch
flyctl auth docker
podman push wp22dev registry.fly.io/[your app name]
flyctl deploy --image registry.fly.io/[your app name]
flyctl secrets set GITHUB_TOKEN=[token]
Where token
is the 40 character alphanumeric string of your GitHub API personal access token. It is in the form of "ghp_2D5TYFikFsQ4U9KPfzHyvigMycePCPqkPgWc".
A demo of this is hosted on Fly.io with this API endpoint:
https://wp22dev.fly.dev/data
This demo instance will go into a sleep state after a period of inactivity (approximately 30 minutes at time of writing). If your API calls to this endpoint is taking more than a few seconds, it might be the demo waking from that state.
The backend server listens to requests for information about a list of open source hardware (and software) repositories hosted on Wikifactory or GitHub.
GET requests to the API are formed as JSON payloads to the /data
endpoint.
There are two components to each request:
repo_urls
: An array of strings of repository URLs, such as https://wikifactory.com/+elektricworks/pikon-telescope
. Currently, metadata retrieval for Wikifactory project and GitHub repository URLs are implemented. Each URL is composed of the Wikifactory domain (wikifactory.com
), space (e.g. +elektricworks
), and project (e.g. pikon-telescope
).
requested_data
: An array of strings representing the types of repository metrics desired for each repository. Currently, the following are implemented for Wikifactory projects:
files_info
: The numbers and proportions of mechanical and electronic computer-assisted design (CAD), image, data, document, and other file types in the repository.files_editability
: Basic information about how "editable" the CAD files are in this repository.license
: The license for the repository.tags
: Aggregated tags for the repository and any associated with the maintainers of that repsitory.commits_level
: The hash identifier (contribution id
for Wikifactory projects) and timestamp of each commit to the repository. This can be used to graph the commit activity level in a frontend visualisation. Note: This will be based on commits from the first three detected branches in the repository, including the default branch. This is because the time it takes to requests commits across various branches take a long time, and APIs might time out. Also note that branches are not implemented by Wikifactory, so it will behave as if there is only one branch.issues_level
: Similar to commits_level
, but for all issues in the repository.The following is an example request that could be sent to the API for three Wikifactory projects:
{
"repo_urls": [
"https://wikifactory.com/+dronecoria/dronecoria-frame",
"https://wikifactory.com/@luzleanne/community-composter",
"https://wikifactory.com/+elektricworks/pikon-telescope"
],
"requested_data": [
"files_info",
"files_editability",
"license",
"tags",
"commits_level",
"issues_level"
]
}
The API will respond with a JSON array containing the requested_data
for each repository in repo_urls
.
Specifically, for each repository, the response will include:
repository
: String containing the repository URL.platform
: String, only Wikifactory
for now.requested_data
: Object containing the following:
files_editability
: Object containing the following: files_count
: Integer number of (presumed to be) CAD files that are not text documents or data files (like CSV).files_openness
: Object containing the following:
open
: Integer number of files using open formats.closed
: Integer number of files using closed/proprietary formats.other
: Integer number of files not categorised in either of the above.files_encoding
: Object containing the following:
binary
: Integer number of files using binary formats.text
: Integer number of files using text-based formats.other
: Integer number of files not categorised in either of the above.files_info
: Object containing the following: total_files
: Integer of total number of files in the repository.ecad_files
: Integer number of electronic CAD files.mcad_files
: Integer number of mechanical CAD files.image_files
: Integer number of image files.data_files
: Integer number of data files.document_files
: Integer number of documentation files.other_files
: Integer number of other types of files.ecad_proportion
: Floating point proportion of electronic CAD files.mcad_proportion
: Floating point proportion of mechanical CAD files.image_proportion
: Floating point proportion of image files.data_proportion
: Floating point proportion of data files.document_proportion
: Floating point proportion of documentation files.other_proportion
: Floating point proportion of other types of files.license
: Object containing license information: key
: String of license idenfifier. Currently the same as spdx_id
.name
: Full name of license.spdx_id
: String of the SPDX license identifier.url
: URL to license text.node_id
: For some licenses, this will be an identifier in GitHub's license list.html_url
: URL to license information.permissions
: Array of strings containing the permissions given by the license, which could include:
commercial-use
: This work and derivatives may be used for commercial purposes.modifications
: This work may be modified.distribution
: This work may be distributed.private-use
: This work may be used and modified in private.patent-use
: This license provides an express grant of patent rights from contributors.conditions
: Array of strings expressing the conditions under which the work could be used, which could include a combination of:
include-copyright
: A copy of the license and copyright notice must be included with the work.include-copyright--source
: A copy of the license and copyright notice must be included with the work in when distributed in source form.document-changes
: Changes made to the source/documentation must be documented.disclose-source
: Source code/documentation must be made available when the work is distributed.network-use-disclose
: Users who interact with software via network are given the right to receive a copy of the source code.same-license
: Modifications must be released under the same license when distributing the work. In some cases a similar or related license may be used.same-license--file
: Modifications of existing files must be released under the same license when distributing the work. In some cases a similar or related license may be used.same-license--library
: Modifications must be released under the same license when distributing software. In some cases a similar or related license may be used, or this condition may not apply to works that use the software as a library.limitations
: Limitations of the license, which could include a combination of:
trademark-use
: This license explicitly states that it does NOT grant trademark rights, even though licenses without such a statement probably do not grant any implicit trademark rights.liability
: This license includes a limitation of liability.patent-use
: This license explicitly states that it does NOT grant any rights in the patents of contributors.warranty
: The license explicitly states that it does NOT provide any warranty.tags
: Aggregated array of strings representing the tags associated with the repository, and tags associated with users who are maintainers/owners of the repository. The implementation of this might change as Wikifactory implements their skill-based matchmaking features.open-source
, raspberry-pi
, space
, 3d-printing
commits_level
: Array of objects representing commits (contributions in Wikifactory), where each one would contain:hash
: A string, where for Git-based repositories, the unique hash identifier for the commit. For Wikifactory, this is the id
field of the contribution.committed
: String containing the timestamp for the commmit in ISO 8601 format, e.g. 2018-04-25T20:35:59.614973+00:00
.issues_level
: Array of objects representing issues, where each one would contain: id
: String containing the URL to the issue.published
: String containing the creation date of the issue in ISO 8601 format, e.g. 2018-04-25T20:35:59.614973+00:00
.isResolved
: Boolean (true
or false
) of whether the issue has been marked as closed or resolved.resolved
: String containing ISO 8601 formatted timestamp representing the last time there was activity in the issue (such as comments), or if the issue isResolved
, the time it happened.Notes:
files_editability
above, filetypes are identified by file extensions. The categories and mapping are documented in oshminer/filetypes.py
, and can be traced the osh-file-types
list by Open Source Ecology Germany.files_info
above, filetypes are identified by file extensions. The categories and mapping are located in oshminer/filetypes.py
.license
information and formatting is largely based on that from the GitHub-managed choosealicense.com repository, with the exception of some open source hardware licenses which were manually added.By default, this tool will:
wikifactory.com
https://wikifactory.com/api/graphql
Both can be customised with the following environmental variables during deployment:
WIF_BASE_URL
- (default: wikifactory.com
) The base domain used for pattern-matching and identifying Wikifactory project URLs in the JSON request body in the form of example.com
. If this is customised, then the requested Wikifactory project URLs passed to this tool should also use that domain instead of wikifactory.com
. Otherwise, an "Repository URL domain not supported" error will be returned.WIF_API_URL
- (default: https://wikifactory.com/api/graphql
) The full URL of the GraphQL API endpoint to make queries regarding Wikifactory projects in the form of https://example.com[:port]/foo/bar
.Dr Pen-Yuan Hsing (@penyuan) is the current maintainer.
Dr Jérémy Bonvoisin (@jbon) was a previous maintainer who contributed greatly to this repository during the first year of the OPENNEXT project and is now an external advisor.
Thank you in advance for your contribution. Please open an issue or submit a GitHub pull request. For more details, please look at CONTRIBUTING.md.
This project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by the Contributor Covenant Code of Conduct 2.0.
The maintainer would like to gratefully acknowledge:
The work in this repository is supported by a European Union Horizon 2020 programme grant (agreement ID 869984).
The Python code in this repository is licensed under the GNU AGPLv3 or any later version © 2022 Pen-Yuan Hsing
This README is licensed under the Creative Commons Attribution-ShareAlike 4.0 International license (CC BY-SA 4.0) © 2022 Pen-Yuan Hsing
Details on other files are in the REUSE specification dep5 file.