awesome-global-contributions / awesome-global-contributions

Awesome Open Source projects dedicated to tackling global challenges
https://awesome-global-contributions.github.io/
Creative Commons Zero v1.0 Universal
19 stars 1 forks source link

Define YAML Entry Format #3

Closed AdrienLemaire closed 5 years ago

AdrienLemaire commented 5 years ago

Related: #2

Interesting examples:

AdrienLemaire commented 5 years ago

Ideas:

OscarAblinger commented 5 years ago

Like I wrote in my comment in #2 I think that sorting by SDGs makes the most sense, since we probably won't be able to pin down a primary programming language for every project.

Similarly, I would sort the entries Alphabetically, because the number of contributions is a bit wacky. What is used as a metric for it? Number of commits is the most common, but different projects have different average commit sizes. Additionally, that sorting would need regular updates, which means a few things:

OscarAblinger commented 5 years ago

The entry format is a more complicated topic. It heavily depends on how much information we want to provide for every project. Most awesome-lists simply have the project name, a link and the short (1 sentence) description of the repository. I would welcome a bigger description text, but I'm skeptical whether it should be shown by default or hidden underneath a <details> element.

I think the following entries are uncontroversial: programming language, license, repo url, website url Number of stars, contributors and maintainers might also be useful. I'm not sure what you mean by "recent activity". Could you elaborate on that?

OscarAblinger commented 5 years ago

Ideas:

  • Sensibilize the readers of the list to the SDGs (assume they don't know the details). Eg write notes for each category.

I do think we should include some explanation of what SDGs are – either written as a document in this repository or as an external link. However, I do think that the main focus of the list should be the projects and that any explanations should be at most 4 lines of text. The rest should be linked.

  • Create and maintain a weekly digest mailing list (entries to the awesome list and marking events in the world related to SDGs)

I'm not sure how successful that would be. For two reasons: 1) Simply watching this repository would achieve a similar effect built into github 2) We probably won't have enough new additions every week to warrant enough content. Including events etc. might fill this, but I feel like that is a bit far away from what this list should do.

  • Have a webpage for the list with more sorting/grouping options

Definitely support that and I've also already offered creating (and maintaining) it.

AdrienLemaire commented 5 years ago

I think that sorting by SDGs makes the most sense, since we probably won't be able to pin down a primary programming language for every project.

The counter-argument is that some projects can be categorized in several SDGs. Managing duplicates would be a bothering chore, but we already thought of a solution by using json/yaml. Let's start with a default SDGs categorization, and if it becomes an issue in the future, it should be easy to change it.

Similarly, I would sort the entries Alphabetically, because the number of contributions is a bit wacky. What is used as a metric for it?

Ok for alphabetically default sorting. My initial thought on a sorting metric is the size of the project by its number of contributors. We can safely assume that the more maintainers a project has, the bigger the project and larger its potential impact. In https://github.com/Fandekasp/awesome-oss-sdgs/issues/5#issuecomment-518056421, I also suggested maintaining a highly-opinionated Index score. This would be lot of work, but I'm sure you can see the value in it :) We could eventually become influencers and help the most impactful projects to get the human resources they need to move on.

Simply watching this repository would achieve a similar effect built into github

Watching the repo is indeed good enough in the first time. Let's just keep the idea in mind if another maintainer joins the team who wants to regularly keep himself in touch with the SDG actuality and share the important bits with everybody (not especially related to the projects listed here).

OscarAblinger commented 5 years ago

The counter-argument is that some projects can be categorized in several SDGs.

That is true, but I'm recon that it's less so in SDGs. In the worst case, we could merge some of them? Then I would probably see problems in cases like this: SDG 1 is similar to SDG 2 or SDG4, but I can see a lot of projects for either combination that do not fit in the other one.

Let's start with a default SDGs categorization, and if it becomes an issue in the future, it should be easy to change it.

That's probably a good idea.

My initial thought on a sorting metric is the size of the project by its number of contributors.

I can see the more-people > more-impact argument. It is still only a very rough estimate, but you definitely have more experience with judging projects than I do. However, I'd rather be a fan of

I also suggested maintaining a highly-opinionated Index score.

Depending on the score that might also be a valid sorting metric.

Let's just keep the idea in mind if another maintainer joins the team who wants to regularly keep himself in touch with the SDG actuality and share the important bits with everybody (not especially related to the projects listed here).

Definitely. I could even see us promoting a third-party email list?

AdrienLemaire commented 5 years ago

As per https://github.com/Fandekasp/awesome-oss-sdgs/issues/2#issuecomment-518191371, let's only agree on a YAML template and start collecting data. The markdown/html rendering will be scripted, so we can leave the discussion of their format for later.

We can also have a list of YAML files that can then be concatenated together

If that's more maintainable. But what would those files be? One for every goal?

How about one YAML file per project? Like this, we can decide in this issue the yaml template, and everytime we find an interesting project, it's a matter of copying the template and filling in the blanks. Should be easy enough for other contributors to participate as well.

for the syntax, since we're into javascript stuff, camelCase makes sense. Let's also ask contributors to use yamllint, to insure all files have valid syntax

I suggest we also leave optional fields for the stuff that might be useful at some point, or that we want to display but isn't always available, or to overwrite information that we can otherwise automatically parse (like github stars)

First draft:

# Project template
---
# The description should explain why this project is awesome.
# and how it impacts a global issue
description: >
  YAML offers several options to write multiline strings,
  Use whatever you want:
  https://yaml-multiline.info/
globalIssues:
  - issue1  # list of issues TBD
# the license should be OSI approved:
# https://opensource.org/licenses/alphabetical
license: CC0
name: "Project's name"
programmingLanguage:
  - python:
    main: true
  - java
  - swift
repoUrl: https://github.com/org/project/
websiteUrl: https://project.org

# OPTIONAL FIELDS

contributionGuidelinesUrl: "https:/project.org/contributors/"
licenseUrl: "https://project.org/license/"
logoUrl: ""
numberContributors: 1000  # https://github.com/org/project/graphs/contributors ?
sdgs: [1, 3]  # list of integers, from 1 to 17
# stars can be auto-collected for github projects
starsUrl: "https://img.shields.io/github/stars/org/project.svg\
          ?style=social&label=Star&maxAge=2592000"
OscarAblinger commented 5 years ago

How about one YAML file per project?

Sounds good

Let's also ask contributors to use yamllint, to insure all files have valid syntax

Yeah, we could also integrate that with GitHub's Pull Requests. It would also be interesting to check for required elements or invalid links etc.. It might be useful to add that, too? Kwalify looks like a solid choice for it, since it not only offers optional types, but also regex-pattern matching. I haven't found any pre-made github-integration for it, though, and I doubt that a custom hook is gonna be worth it for now. It also lacks further checks, like whether a link produces a valid answer or not. I guess, we could provide a script to check for it rather easily, though (we already need to parse for all of those things to auto-generate the rest of the files, anyways)

I suggest we also leave optional fields for the stuff that might be useful at some point, or that we want to display but isn't always available, or to overwrite information that we can otherwise automatically parse (like github stars)

Sure, I'd suggest to differentiate between optionals and overwrites, though. "optionals" being information like contributionGuidelinesUrl and logoUrl, that do not necessarily have to be provided "_overwrites__ would be information that we try to automatically get (like starUrl or numberContributors), but can be overwritten if e.g. the given domain does not support it. They should generally also be allowed to be not set, if they can't be auto-generated and have not been overwritten.

OscarAblinger commented 5 years ago

Concerning the first draft:

After these draft 2 would be:

# Project template
---
# The description should explain why this project is awesome.
# and how it impacts a global issue
description: >
  YAML offers several options to write multiline strings,
  Use whatever you want:
  https://yaml-multiline.info/
globalIssues:
  - issue1  # list of issues TBD
# the license should be OSI approved:
# https://opensource.org/licenses/alphabetical
license: CC0
name: "Project's name"
programmingLanguages:  # (added an plural s here)
  - python
  - java
  - swift
repoUrl: https://github.com/org/project/
websiteUrl: https://project.org
contributionGuidelinesUrl: "https:/project.org/contribute/"
# See guideline on how to rate a project
rating: 5

# OVERWRITE GENERATED FIELDS
# We will attempt to auto-generate these
# values, but aren't always able to do so.
# Please overwrite if needed. (TODO: Provide link to more information)

logoUrl: ""
numberContributorsUrl: https://api.github.com/repos/org/proj/stats/contributors
starsUrl: "https://img.shields.io/github/stars/org/project.svg\
          ?style=social&label=Star&maxAge=2592000"

# OPTIONAL FIELDS
# You can leave these fields empty,
# if they don't fit the project

licenseUrl: "https://project.org/license/"
sdgs: [1, 3]  # list of integers, from 1 to 17
naturalLanguages:
  - English
  - German
OscarAblinger commented 5 years ago

About the multiple yaml files: for the webpage we would need one SST. With multiple files, we could theoretically access the Github api or have another file, which indexes all of them and the website will request all of them one by one, but that's rather inefficient. Github pages also doesn't allow for server-side scripting in order to cache the list.

So, if we want to have a separate file for every entry (which I really like), we'd either need: 1) One file that aggregates all of them and gets automatically created together with the markdown 2) One file that references all of the known project files 3) Access to the github-api to parse for the relevant files

While 1) is by far the nicest version for the website, it also adds a lot of clutter. I, therefore lean towards 3), but I'd have to look further into the github API on how easy it will be.

OscarAblinger commented 5 years ago

Since I'm currently writing the first draft for the Contribution Guideline, I noticed how weird the overwrite functionality is: We are capable of extrapolating some information. However, it would make sense to add it to the file nonetheless, because then we will avoid having to do it every time (including on the website). Since the user already has to execute the markdown generator, we could also re-write the original yaml-file? This might be a bit irritating, but it would enable us to not require the user to fill out the autogenerateable information and still have it added to the file. The other option I could see, would be to have the user add it manually and simply provide the links for the usual platforms in the documentation.

Related: #8

OscarAblinger commented 5 years ago

Update of the template from feature/contribution-guideline: (open for evaluation)

# Project template
---
name: Project's name
# The description should explain why this project is awesome.
# and how it impacts a global issue
description: >
  YAML offers several options to write multiline strings,
  Use whatever you want:
  https://yaml-multiline.info/
globalIssues:
  - issue1  # list of issues TBD
# the license should be OSI approved:
# https://opensource.org/licenses/alphabetical
license: CC0
programmingLanguages:
  - python
  - javascript
  - c#
repoUrl: "https://github.com/org/project/"
websiteUrl: "https://project.org"
# See guideline on how to rate a project
rating: 1

# OVERWRITE GENERATED FIELDS
# We will attempt to auto-generate these
# values, but aren't always able to do so.
# Please overwrite if needed. (TODO: Provide link to more information)

contributionGuidelinesUrl: "https:/project.org/contribute/"
logoUrl: ""
starsUrl: "https://img.shields.io/github/stars/org/project.svg?style=social&label=Star&maxAge=2592000"
# Link to access the amount of contributors to the project
# The type defines what answer is expected
# See CONTRIBUTING.md for more information
numberContributors:
  url: "https://api.github.com/repos/org/proj/stats/contributors"
  format: json # or yaml or xml
  accessor: list # or number or property name prepended by ?

# OPTIONAL FIELDS
# These fields may be empty.
# If they fit the project, please still
# try to fill them out

licenseUrl: "https://project.org/license/"
sdgs: [1, 3]  # list of integers, from 1 to 17
naturalLanguages:
  - English
  - German
AdrienLemaire commented 5 years ago

Closing since the related PR has been merged