coreinfrastructure / best-practices-badge

🏆Open Source Security Foundation (OpenSSF) Best Practices Badge (formerly Core Infrastructure Initiative (CII) Best Practices Badge)
https://www.bestpractices.dev
MIT License
1.2k stars 203 forks source link

Criteria idea: Accessible to those who understand English #230

Closed david-a-wheeler closed 8 years ago

david-a-wheeler commented 8 years ago

I had a discussion with others today about the need to make sure that people can participate from a variety of places and cultures. A criterion that just said, "don't create place/culture barriers" isn't very actionable or measurable.

However, it would be possible to include, "The project MUST be accessible to those who understand the English language."

It's difficult for people to work together without at least one common natural language, and in FLOSS projects today that common language is English. Indeed, in technology circles English is the linga franca. This doesn't require that anyone be a native speaker, just that if you know English you should be able to participate. For an example, witness all the effort that the LibreOffice people have put in to translate the comments from German into English. Supporting English enables more people worldwide to participate.

This potential criterion may be controversial. That's okay, my goal is to get ideas out there for discussion. Perhaps this is a bad idea, or one that should only be at higher levels. Please discuss!

altonius commented 8 years ago

wow, this could be very controversial, especially is developing a project that is specific to a particular country or ethnic group. the first theoretical problem that comes to mind is a project that is aiming to help with indigenous languages having this requirement forced on them - ouch!

Here's my take:

On a language-related I learned earlier about https://www.transifex.com which can be used to help translate projects to other languages, maybe it's something that could be considered to internationalise our project in the future - FLOSS projects don't have to pay.

Alton(ius)

david-a-wheeler commented 8 years ago

The goal would be to absolutely not prevent participation or support for a particular language group - it would be to enable others to participate. But it's a fair point that some projects only apply to a particular language group, and thus, imposing such a criterion might inhibit appropriate contributions. You noted that you're "not keen on defining language...", got it.

I did note that this "may be controversial". However, I'd rather have that controversy out in the open. I'm a big believer in brainstorming - put out ideas, even if there are problems. Maybe the problems can be solved, and maybe the problems make the idea unworkable. But we're more likely to get a better list if we can more openly discuss ideas.

You mentioned some alternatives, but I don't think they help. First, there's no need to ask organizations to "define a primary (spoken/written/non-programming) language for communicating about the project"." One look at a project's website will tell you that :-).

I'm hesitant of trying to make some sort of general statement about how a project creates "an inclusive culture". There are projects and organizations that formally document how they do this, but I don't think there's consensus about it, what it means, or even that it's necessarily a good thing (whatever "it" is). In general the "best practices" criteria try to identify generally-accepted criteria, and I'm skeptical we can get truly generally-accepted general criteria that would be worth it. Pretty much no one today has a CULTURE file, so it's hard to argue that this is a generally-accepted practice today. Many projects simply focus on the code and documentation, and how to contribute to them, and in many respects that kind of focus solves a lot. If you focus on the problem, then it's not hard to be grateful for help (when it's actually help). But it's not clear that can be put into a useful criterion.

I'd really like to focus on specific criteria we can measure relatively unambiguously and that are currently generally accepted as good things. "Presence of English" is relatively easy to measure unambiguously. Of course, if it's not generally accepted as a good thing, then let's drop it. There are lots of ideas that get dropped after discussion :-).

I guess I'm currently leaning against this criterion now, but I can be swayed...!

dankohn commented 8 years ago

I will register in support of the criterion. We could modify it to say, "Whatever the primary language of the core developers may be, the project should include documentation and be able to accept bug reports and code comments in English, since English is currently the lingua franca of technology." I would also make lingua franca a link: https://en.wikipedia.org/wiki/Lingua_franca

Dan Kohn mailto:dankohn@linux.com Senior Advisor, Core Infrastructure Initiative tel:+1-415-233-1000

On Fri, Feb 26, 2016 at 10:12 AM, David A. Wheeler <notifications@github.com

wrote:

The goal would be to absolutely not prevent participation or support for a particular language group - it would be to enable others to participate. But it's a fair point that some projects only apply to a particular language group, and thus, imposing such a criterion might inhibit appropriate contributions. You noted that you're "not keen on defining language...", got it.

I did note that this "may be controversial". However, I'd rather have that controversy out in the open. I'm a big believer in brainstorming - put out ideas, even if there are problems. Maybe the problems can be solved, and maybe the problems make the idea unworkable. But we're more likely to get a better list if we can more openly discuss ideas.

You mentioned some alternatives, but I don't think they help. First, there's no need to ask organizations to "define a primary (spoken/written/non-programming) language for communicating about the project"." One look at a project's website will tell you that :-).

I'm hesitant of trying to make some sort of general statement about how a project creates "an inclusive culture". There are projects and organizations that formally document how they do this, but I don't think there's consensus about it, what it means, or even that it's necessarily a good thing (whatever "it" is). In general the "best practices" criteria try to identify generally-accepted criteria, and I'm skeptical we can get truly generally-accepted general criteria that would be worth it. Pretty much no one today has a CULTURE file, so it's hard to argue that this is a generally-accepted practice today. Many projects simply focus on the code and documentation, and how to contribute to them, and in many respects that kind of focus solves a lot. If you focus on the problem, then it's not hard to be grateful for help (when it's actually help). But it's not clear that can be put into a useful criterion.

I'd really like to focus on specific criteria we can measure relatively unambiguously and that are currently generally accepted as good things. "Presence of English" is relatively easy to measure unambiguously. Of course, if it's not generally accepted as a good thing, then let's drop it. There are lots of ideas that get dropped after discussion :-).

I guess I'm currently leaning against this criterion now, but I can be swayed...!

— Reply to this email directly or view it on GitHub https://github.com/linuxfoundation/cii-best-practices-badge/issues/230#issuecomment-189317917 .

david-a-wheeler commented 8 years ago

Ok, dropping it down to SHOULD makes sense. The MUST and SHOULD criteria are normally worded as "The project MUST|SHOULD...", so let's use that consistent pattern unless there's a reason we shouldn't. Also, we should put details and rationale in later sentences, so that people can hide them. Finally, instead of "code comments" how about "comments about code"; while I think it's wisest to have comments embedded in code use English, I think the bigger point is that comments about code should be accepted if they're in English.

How about this?:

The project SHOULD include documentation in English and be able to accept bug reports and comments about code in English. English is currently the lingua franca of computer technology; supporting English increases the number of different potential developers and reviewers. A project can meet this criterion even if its core developers' primary language is not English.

dankohn commented 8 years ago

I like this but I think we should move it back to MUST. I'm not aware of projects that would even be close to qualifying for the badge but wouldn't because of this requirement.

Dan Kohn mailto:dankohn@linux.com Senior Advisor, Core Infrastructure Initiative tel:+1-415-233-1000

On Fri, Feb 26, 2016 at 1:16 PM, David A. Wheeler notifications@github.com wrote:

Ok, dropping it down to SHOULD makes sense. The MUST and SHOULD criteria are normally worded as "The project MUST|SHOULD...", so let's use that consistent pattern unless there's a reason we shouldn't. Also, we should put details and rationale in later sentences, so that people can hide them. Finally, instead of "code comments" how about "comments about code"; while I think it's wisest to have comments embedded in code use English, I think the bigger point is that comments about code should be accepted if they're in English.

How about this?:

The project SHOULD include documentation in English and be able to accept bug reports and comments about code in English. English is currently the lingua franca https://en.wikipedia.org/wiki/Lingua_franca of computer technology; supporting English increases the number of different potential developers and reviewers. A project can meet this criterion even if its core developers' primary language is not English.

— Reply to this email directly or view it on GitHub https://github.com/linuxfoundation/cii-best-practices-badge/issues/230#issuecomment-189407509 .

altonius commented 8 years ago

I was agreeing with you that it could be controversial and was playing devils advocate. like you, I'm a fan of having the discussion openly (though we possibly have some self-selection bias with this criteria as anyone who can't communicate in english is already excluded from this discussion :-) )

Back to reality now (instead of my own overly lofty hypotheticals) - I like the proposed criteria and the rationale behind the criteria, and I'm voting for it being a SHOULD criteria, it gives current and future projects that we may not be aware of the opportunity to provide the rationale and still meet best-practices.

I'd much rather a project fail to achieve a badge due to bad coding or security practices instead of the language(s) it uses for communicating.

Alton(ius)

On Sat, 27 Feb 2016 at 05:28 Dan Kohn notifications@github.com wrote:

I like this but I think we should move it back to MUST. I'm not aware of projects that would even be close to qualifying for the badge but wouldn't because of this requirement.

Dan Kohn mailto:dankohn@linux.com Senior Advisor, Core Infrastructure Initiative tel:+1-415-233-1000

On Fri, Feb 26, 2016 at 1:16 PM, David A. Wheeler < notifications@github.com> wrote:

Ok, dropping it down to SHOULD makes sense. The MUST and SHOULD criteria are normally worded as "The project MUST|SHOULD...", so let's use that consistent pattern unless there's a reason we shouldn't. Also, we should put details and rationale in later sentences, so that people can hide them. Finally, instead of "code comments" how about "comments about code"; while I think it's wisest to have comments embedded in code use English, I think the bigger point is that comments about code should be accepted if they're in English.

How about this?:

The project SHOULD include documentation in English and be able to accept bug reports and comments about code in English. English is currently the lingua franca https://en.wikipedia.org/wiki/Lingua_franca of computer technology; supporting English increases the number of different potential developers and reviewers. A project can meet this criterion even if its core developers' primary language is not English.

— Reply to this email directly or view it on GitHub < https://github.com/linuxfoundation/cii-best-practices-badge/issues/230#issuecomment-189407509

.

— Reply to this email directly or view it on GitHub https://github.com/linuxfoundation/cii-best-practices-badge/issues/230#issuecomment-189413152 .

david-a-wheeler commented 8 years ago

Let's add this as a SHOULD to start. We can change the category later, and putting it in the main text will get more visibility. In general I've tried to emphasize best practices that an individual developer could do with some non-Herculean effort. Learning an entire natural language is a big step higher. We could say for the latter (or at least English with a restricted vocabulary). See special english or the xkcd stuff with the top ten hundred words.

david-a-wheeler commented 8 years ago

Here's a tweaked version, currently SHOULD, but mentioning use of a Simple English and adding the word "worldwide" (which hopefully helps readers realize the issue is simply that English is known around the world).

The project SHOULD include documentation in English and be able to accept bug reports and comments about code in English (at least some form of Simple English). English is currently the lingua franca of computer technology; supporting English increases the number of different potential developers and reviewers worldwide. A project can meet this criterion even if its core developers' primary language is not English.

dankohn commented 8 years ago

I would remove the Simple English parenthetical, since it doesn't lead to a clear Wikipedia page.

Dan Kohn mailto:dankohn@linux.com Senior Advisor, Core Infrastructure Initiative tel:+1-415-233-1000

On Sat, Feb 27, 2016 at 2:49 PM, David A. Wheeler notifications@github.com wrote:

Here's a tweaked version, currently SHOULD, but mentioning use of a Simple English and adding the word "worldwide" (which hopefully helps readers realize the issue is simply that English is known around the world).

The project SHOULD include documentation in English and be able to accept bug reports and comments about code in English (at least some form of Simple English https://en.wikipedia.org/wiki/Simple_English). English is currently the lingua franca https://en.wikipedia.org/wiki/Lingua_franca of computer technology; supporting English increases the number of different potential developers and reviewers worldwide. A project can meet this criterion even if its core developers' primary language is not English.

— Reply to this email directly or view it on GitHub https://github.com/linuxfoundation/cii-best-practices-badge/issues/230#issuecomment-189713867 .

david-a-wheeler commented 8 years ago

Ok, will remove parenthetical about Simple English.

david-a-wheeler commented 8 years ago

This is resolved by commit b7dd4f2f39ee8545ce002d5b2cd6584b33e3f484

nemobis commented 8 years ago

Requiring English may work in practice, but is discriminatory and leaves a bad taste. English is not the only vehicular language used in practice by online communities (for instance there are groups of speakers of Italian, Portuguese and Spanish who interact each using their own language) and English monolinguals should not be favoured over other monolinguals.

If non-native speakers of English are forced to make an effort to speak English, an equal effort should be required from native speakers, e.g. by mandating the existence of (open) processes to translate software and documentation.

david-a-wheeler commented 8 years ago

This is very much worth discussing, thanks for your comments. I do think the current criterion makes sense, but dialogue is always an excellent idea. Please allow me to expand further (beyond the text above).

English is no better nor worse than any other natural language. I'm all for people learning and using many natural languages. I studied French (and can still read technical French), when I was young I spoke German (I lived there for years), and I've also studied Greek, Portuguese, and American Sign Language. I think it's sad that so many languages around the world are disappearing.

However, if people across the world are going to work together, they need to have some way to communicate. It's not reasonable to require that all people learn all languages. If there's a group of developers that can only communicate in Spanish, they will exclude the larger community of people who cannot speak Spanish. Any language decision will exclude someone - so how can we minimize those who are excluded?

Today, it's typically a given that the language for international communication is English. For example, The International Civil Aviation Organisation (ICAO) has established English language proficiency requirements (LPRs) for all pilots operating on international routes, and all air traffic controllers who communicate with foreign pilots.; "These standards require pilots and air traffic controllers to be able to communicate proficiently using both ICAO phraseology and plain English."

Wikipedia's list of languages by total number of speakers makes it clear why this is the case. The languages with the most number of speakers (at either L1 or L2 levels) are:

Language Total English 1,500 million Mandarin Chinese 1,090 million Spanish 560 million Hindustani (Hindi-Urdu) 541 million Arabic 395 million Russian 260 million Malay 250 million Portuguese 250 million French 220 million German 210 million

The EU officially supports 24 languages, but even with hundreds of millions of Euros spent on translation the EU can't keep up. In practice, most EU institutions use English as their working language. If a group only speaks Spanish, they can only reach 1/3 as many people as English (and in practice even less, since it's typically easy to find at least some English speakers in nearly every city in the world).

OSS projects typically don't have the EU's translation budget, so English is an even more common feature in the OSS community. LibreOffice, for example, has been working hard to translate its comments from German to English: https://wiki.documentfoundation.org/Development/Easy_Hacks/Translation_Of_Comments https://bugs.documentfoundation.org/show_bug.cgi?id=39468 Linus Torvalds' native language is Swedish, and he also knows Finnish and English - but kernel development has always been in English.

an equal effort should be required from native speakers, e.g. by mandating the existence of (open) processes to translate software and documentation.

Do you have specific criteria text to propose? One challenge is that projects handle this in different ways; in some cases the documentation is actually a separate project, so that might be hard to mandate.

I certainly agree that software intended primary for end-users should be internationalized and support localization. However, if the software's primary purpose is to be a library for other developers, many library developers would expect that the software developers will learn English. Then nobody has to translate billions of pages into thousands of languages. After all, anyone can learn English; it's not proprietary. Clearly that is not an outcome that would make monolinguists of other languages happy, but the economics of trying to translate everything for everyone is difficult to justify in many cases. Indeed, for a number of projects, development velocity is the most important factor to optimize for; anything that slows development (like requiring translations or copyright transfers) is dangerous to the project since it would make the project uncompetitive.

There is a potential radical change: Continued improvements in freely-available machine translation. I'm quite aware that current machine translations leave a lot to be desired. That said, if you limit yourself to simple grammatical structures, and avoid slang and idiomatic phrases, they are already good enough for very simple constructs. If machine translation continues to improve, to the point where people are happy to turn on "auto translate" (or whatever) in their browser and text editor, then there may be no need for a common language. The goal isn't 1 language, it's simply to enable widespread communication.

So, what do you suggest? Adding something? Dropping it?

Nikerabbit commented 8 years ago

If we were talking about a company, it makes total sense to agree on a common language to increase productivity, because that is what earns them money and that's what companies do.

But open source is not a company and I highly contest producing code at highest possible rate is what should be optimized.

English does not need any more help to stay as the lingua franca of technology. The other languages do need support so that software development does not become more of a monoculture. What I think we should be looking for is an inclusive culture. This partially overlaps with Code of Conduct and similar constructs that disallow bad behavior and discrimination.

In practice this would mean, for example, that if someone posts a bug report in Russian, they developers would try to understand it via machine translation before telling the reporter just to go away; or not ignoring the message of a person who writes in really bad English. Some larger projects can have volunteers that can help in translating, but of course smaller projects do not have this luxury.

Having code comments in English or at least in one language only would in my opinion fall in the category of having a coding style guideline and enforcing it.

I believe we are not actually disagreeing on the goal of widespread communication. But there seems to be an unmentioned assumption whether the focus should be on making if easier for those already inside in the ecosystem, or whether the focus should be to accommodate the people who we are currently inadvertently excluding.

The (possibility of) localisation of open source projects has been their strength, and can be considered as a best practice. It would be weird then if we had the opposite best practice for developmental activity. In conclusion, I think enforcing English is not the way we should take and I propose dropping this criteria until we find something better.

nemobis commented 8 years ago

Another way to look at what I said earlier, i.e. that being a native or near-native English speaker doesn't mean one speaks effective English. https://medium.com/@mollyclare/taming-the-steamroller-how-to-communicate-compassionately-with-non-native-english-speakers-d95d8d1845a0

A multilingual mindset, if not actual multilingualism, is required from anyone to communicate effectively in an online community. A monolingual English speaker can make online written conversation harder than someone with a very bad English.

dandv commented 6 years ago

I LOVE the idea of requiring English for any OSS project that doesn't pertain to a specific language, and it boggles my mind how people continue to post in Russian or German or whatever the language of some of the contributors happens to be :-1:

The other languages do need support so that software development does not become more of a monoculture.

A culture comprises much more than language. Language, here, is just a means of communicating ideas. A standard like TCP or HTTP. Let's be specific. Having all sorts of weirdo networking standards (anyone remember IPX/SPX?) has not helped progress in sharing knowledge. Using more than one language doesn't help either:

if someone posts a bug report in Russian, they developers would try to understand it via machine translation before telling the reporter just to go away

How about suggesting to the Russian speaker to use machine translation in the first place? That way, they spend time translating once, instead of each person who reads the bug report translating it.

For more on this topic, please refer to this extensive essay I wrote on sticking to one language as a standard for communication. Jeff Atwood also wrote an excellent essay about English and programming specifically.


1It is quite well-known that the vast majority of Germans speak English very well, but many are afraid to do so for fear of committing some mistake. They thus favor German, which leads to the consequences I've mentioned above.

nemobis commented 6 years ago

Dan Dascalescu, 09/08/2018 08:31:

it boggles my mind how people continue to post in Russian or German or whatever the language of some of the contributors happens to be 👎

I don't understand, was this sarcastic?

dandv commented 6 years ago

I don't understand, was this sarcastic?

It was not. I'm genuinely puzzled by programmers who just seem to not notice what the prevalent language used in an online programming community (a repo, forum etc.) is.

That's like showing up to a meeting at someone's house, ignoring the fact that all attendees have left their shoes at the door, and walking in wearing your favorite shoes. Might be a form of dyslexia? I have no idea. Today I saw a Chinese user posting a question in an English repo, just like that. Not even with an apology that they don't speak English, or weren't able to use Google Translate (which is available in China). :confused:

simontseng commented 6 years ago

I do suspect maybe this guy entered the Chinese title while browsing under some kind page translation program, think everything is in Chinese

I wouldn’t be surprised

nemobis commented 6 years ago

Dan Dascalescu, 12/08/2018 03:48:

It was not. I'm genuinely puzzled by programmers who just seem to not notice what the prevalent language used in an online programming community (a repo, forum etc.) is.

It's extremely discriminatory to think that everyone must talk English and that if they don't they must have problems grasping reality. Someone can be good at doing what they do even without having a good knowledge of English.

The most appropriate language in which to develop a software is that which works best for most of its developers and target users, which cannot be blindly assumed to be English.

dandv commented 6 years ago

@simontseng

I do suspect maybe this guy entered the Chinese title while browsing under some kind page translation program, think everything is in Chinese I wouldn’t be surprised

I really doubt that, for two reasons:

  1. Every web page translation software I've seen is pretty obvious, through a UI affordance (translation bar, pop-up prompt etc), and through the slightly broken translation it provides.

  2. The repo where I saw that user post a question in Chinese, is a private one. It requires registration involving sending a signed PDF form, in English. Everything about the repo, its site, wiki (which is required reading before posting an issue) is in English.

@nemobis

It's extremely discriminatory to think that everyone must talk English and that if they don't they must have problems grasping reality.

Please re-read my post, because that's not what I said. Let me further clarify what I'm saying:

If you want help from an online community that predominantly uses a particular language, then it is in your best interest to use that language. It will not only show respect, but will maximize the chances of getting help.

Note that this is vastly followed in practice. Moreover, many of the most prolific contributors on GitHub are not native English speakers, yet they publish in English, I guess in order to get the widest exposure and PRs, but also to not discriminate by using their native tongue. Whether you like it or not, English is the lingua franca of software development.

Anyway, let's have a look at https://git.io/top:

I'm all for being inclusive. But when I'm on a mostly-English software forum or repo and I see threads in a language I don't understand, I feel excluded (if not discriminated against). It may be cute to post a question in the language of one of the contributors, but it's immature. I know I've been tempted to do the same, but I've limited myself to replacing "Hi" with the Romanian "Salut" as a head nod to a compadre, because posting the entire issue in Romanian wouldn't help anyone else. We're on GitHub to share code and knowledge.

I also see posting the question in the language of the repo as a sign of respect. FLOSS authors put considerable effort into providing free software. if you want help, the least you can do is to ask your question in the language of that community. @nemobis, would you go post an issue in Italian or Finnish in one of @IonicaBizau's repos? If not, why not?

Also, there's a finer note on this language/respect thing: in my 9+ years of using and contributing to OSS projects on GitHub, I have never, not once, seen a PR to an English repo authored in a non-English language. I have only seen questions asked in other languages. That says something.

We want to be inclusive. Upon whom should the burden of translating such questions fall? Do we wait for contributors who speak that language to answer the question? What if none of them speak it? Do we use machine translation and answer back in the original language? Why exactly shouldn't the asker spend that bit of effort and use the translation software? Is it too much for us providers of free software and free consultations to ask for one minute of the asker's time to make their question (and our answer) intelligible to all? We want to be inclusive after all. Shouldn't that "we" include the asker?

Keep in mind that time is money. Apples to apples, support for, say, a charting library can be valued at $175/hour, or can be provided for free, as is the case with Highcharts, whose Norway-based team has been doing so since 2009. If you think about the costs involved, demanding that a FLOSS developer answer your question in your mother tongue, is the equivalent of both begging for money, and hoarding, because your question isn't searchable by speakers of any other language, so the developer will have to provide the answer again when the same question gets asked.

I hope this makes my point clearer. And just to be extra clear, I'm not advocating for any particular language to be the lingua franca of all repos. Just as we use one prettyfying standard or another, the point is to stick to it. If you want help with some Chinese-only widget, by all means, use Chinese. But pick one most appropriate language for the repo, and have that be as standard as the coding language(s) used. Would you accept a PR in Python to a C repo?

Someone can be good at doing what they do even without having a good knowledge of English.

That's true for many fields, but less often the case for programming. StackOverflow founder Jeff Atwood wrote an excellent essay about this topic.