OpenMined / .github

All our community health files
Apache License 2.0
7 stars 13 forks source link

Project naming standards #11

Closed cereallarceny closed 4 years ago

cereallarceny commented 4 years ago

"There are only two hard things in Computer Science: cache invalidation and naming things."

Phil Karlton

We have a variety of names of projects within OpenMined, many of which follow a certain standard and organizational clarity. This is an effort to ensure that all OpenMined project, repository, and release names are clear and concise. Let's start with a few terms to get on the same page:

Terminology

Project name - This is known as the colloquial name of the project. It's mostly used in conversation and might be the same as the repository name.

Example: syft.js (pronounced "syft jay ess")

Repository name (also "repo name") - This is the literal name of the Github repository. As previously stated, this will likely be the same name as the project name mentioned above.

Release name - This is the name of the project as it shows in various package managers

Example: @openmined/syft.js

Organization scoping (also "org scoping") - This is a concept that some package managers support that allows a package name to be prepended with the name of the organization, in this case, "openmined". Sometimes there is a mandatory "at-sign" ("@") prepended to the organization's name. Note: this only applies for the release name, not the project or repository names.

Example on NPM (package manager for Javascript): @­openmined/syft.js

Reasoning

Since there are many existing repos within the OpenMined community already (over 50 at present time), it's best that we try to respect existing repository names and release names as much as possible. However, many rules introduced here will introduce breaking changes. This is well understood, but inevitably unavoidable if we also desire to have a coordinated naming strategy. The following are a few reasons we might want to do this:

  1. This makes our Github much easier to follow, which provides more contribution opportunities for beginners and newcomers to the community. There are many organizational best practices we're beginning to implement across our organization - this is one of many that makes it easier for people to get engaged in our community and stay engaged in our community. Enough said.
  2. This helps us stay more organized as we scale. Simply by giving something a name, you make it easier to converse about in a common manner. It also cuts down on silly discussions about what something should be called when it's more important to remain focused on what is being coded.
  3. We have a release management team whose job it is to make these types of changes easier for you as a core developer to follow. You stick to writing code and allow the release management team to automate these concerns away from your responsibility. Let's be clear, this won't be an overnight change, but we all know it will be easier in time.
  4. We can plan for any breaking changes that this causes months in advance. For instance, our flagship repository, PySyft, will be most affected by these changes. It's entirely possible that we could dual-release the library over the following months with a deprecation message provided to those that use the old release name in their code. This is a fairly standard practice in many coding communities and is reasonable given that we'd be providing ample notice to our users ahead of time.
  5. The rules proposed in this issue ARE up for debate - we're not taking your autonomy from you. With that said, at which point we arrive on a reasonably agreed-upon standard for naming, we should stick with it. We're committing to a standard, let's be clear about that.

Proposed Rules

Every language is different. Every package manager is different. It's going to be difficult to prescribe rules on a general basis, but here's our best attempt at language-agnostic rules:

  1. Use acronyms carefully and only when the term in question is universal to the data science community. While "NLP" might be an industry-standard term for most data scientists, it's possible that it could be less familiar to beginners. On the other hand, "natural language processing" is one hell of a long name for a project. In this case, an acronym is justified.
  2. Avoid arbitrary names that don't explain what the project is trying to accomplish. Slightly controversial here, but certain names like "syft", "opus", "grid", and "syfer" don't tell me what the project is for (perhaps this last one does, because it's a plan on "cipher" and "syft"... but I digress), therefore as a beginner it's quite difficult to suss out which repos are important to me. On the other hand, it's reasonable that many projects can and should be "grandfathered" in and not be subjected to this rule. Going forward, however, all recently created and new projects should be named according to something that explains the intention of the project.
  3. Avoid being too long in the project name. While this appears to be in contradiction with the first rule, it's instead intended to describe the delicate balance between too much information and too little information. When naming your project, just try to keep this in mind - there's not too much to say here.
  4. Respect the established standards of the language. It's common for languages to have differing standards on naming practices. We'll do our best to elaborate on those differences in the next section.
  5. Organization scoping is mandatory, even for package managers that don't have official support for it. This will require some creativity in a few languages, but it's not impossible in any of them. It will be required to put the string "openmined" (and possibly an @ sign before "openmined", depending on the language) before the release name and separate by some character (depending on the language): most likely -, /, or _. Organization scoping is done for three primary reasons: consistency across all projects and all languages, discoverability of other projects in search results, and general OpenMined brand recognition.
  6. Projects that are multi-lingual should be named according to the project name, and avoid language-specific standards. This will be quite common for projects that implement code in one low-level language (C++, for example) and then provide multiple libraries in other higher-level languages (Java, Python, Javascript, etc.). However, if you maintain multiple separate repositories that cover one language in each (the Syft family), your repository should be named according to the language standards defined below.

Language-specific Rules

Rather than go into the subtleties of best practices across multiple languages, it's perhaps best to consider some examples:

Multiple repositories: Project name: Differential Privacy Project C++ repository name: dp.cpp C++ release name: @openmined/dp.cpp Python repository name: PyDP Python release name: openmined.dp Javascript repository name: dp.js Javascript release name: @openmined/dp.js Java repository name: JavaDP Java release name: org.openmined.java-dp Kotlin repository name: KotlinDP Kotlin release name: org.openmined.kotlin-dp Clojure repository name: clojure-dp Clojure release name: org.openmined.clojure-dp Swift repository name: SwiftDP Swift release name: OpenMinedDP

Single repository (w/ multiple languages): Project name: Private Set Intersection Project Repository name: PSI C++ release name: @openmined/psi.cpp Python release name: openmined.psi Javascript release name: @openmined/psi.js Java release name: org.openmined.java-psi Kotlin release name: org.openmined.kotlin-psi Clojure release name: org.openmined.clojure-psi Swift release name: OpenMinedPSI

Real-world Changes

The following are the proposed name changes to existing OpenMined projects:

~Aries-DID~ --> aries-did.js

Repository name: aries-did.js Release name: @openmined/aries-did.js

~aries-fl~ --> PyAriesFL

Repository name: PyAriesFL Release name: openmined.ariesfl

daa.js

Repository name: daa.js Release name: @openmined/daa.js

~differential-privacy-clj~ --> clojure-dp

Repository name: clojure-dp Release name: org.openmined.clojure-dp

~differentialprivacy-ts~ --> dp.js

Repository name: dp.js Release name: @openmined/dp.js

~differentialPrivacyR~ --> diffPrivR

Repository name: diffPrivR Release name: openminedDiffPrivR

grid-admin

Repository name: grid-admin (There is no .js here because it's not strictly a Javascript library.) Release name: There is no "release". There are deployments, but there aren't names or versions for deployments.

~GridNetwork~ --> PyGridNetwork

Repository name: PyGridNetwork Release name: openmined.gridnetwork I'm not sure if this library even needs a "release" as it's purely just used within PyGrid. If there is a release, the above should be the name.

~GridNode~ --> PyGridNode

Repository name: PyGridNode Release name: openmined.gridnode I'm not sure if this library even needs a "release" as it's purely just used within PyGrid. If there is a release, the above should be the name.

KotlinPSI

Repository name: KotlinPSI Release name: org.openmined.kotlin-psi

KotlinSyft

Repository name: KotlinSyft Release name: org.openmined.kotlin-syft

~openmined-ui~ --> omui.js

Repository name: omui.js Release name: @openmined/omui.js

openmined-website

Repository name: openmined-website (There is no .js here because it's not strictly a Javascript library.) Release name: There is no "release". There are deployments, but there aren't names or versions for deployments.

opus

Repository name: opus Release name: openmined.opus

~org.openmined.dp~ --> JavaDP

Repository name: JavaDP Release name: org.openmined.java-dp

~paillier-pure~ --> paillier.js

Repository name: paillier.js Release name: @openmined/paillier.js

PIR

Repository name: PIR Release name: @openmined/pir.cpp

PSI

Repository name: PSI C++ release name: @openmined/psi.cpp Python release name: openmined.psi Javascript release name: @openmined/psi.js Java release name: org.openmined.java-psi Kotlin release name: org.openmined.kotlin-psi Clojure release name: org.openmined.clojure-psi Swift release name: OpenMinedPSI

PyDP

Repository name: PyDP Release name: openmined.dp

PyGrid

Repository name: PyGrid Release name: openmined.grid

PyPSI

Repository name: PyPSI Release name: openmined.psi

PySyft

Repository name: PySyft Release name: openmined.syft

SwiftDP

Repository name: SwiftDP Release name: OpenMinedSwiftDP

SwiftPSI

Repository name: SwiftPSI Release name: OpenMinedSwiftPSI

SwiftSyft

Repository name: SwiftSyft Release name: OpenMinedSwiftSyft

SyferText

Repository name: SyferText Release name: openmined.syfertext

~syft-proto~ --> SyftProtobuf

Repository name: SyftProtobuf Python release name: openmined.syftprotobuf Javascript release name: @openmined/syft-protobuf.js Kotlin release name: org.openmined.kotlin-syft-protobuf Swift release name: OpenMinedSwiftSyftProtobuf

syft.js

Repository name: syft.js Release name: @openmined/syft.js

TenSEAL

Repository name: TenSEAL Release name: openmined.tenseal

~threepio~ --> Threepio

Repository name: Threepio Python release name: openmined.threepio Javascript release name: @openmined/threepio.js

iamtrask commented 4 years ago

Love all of this!

The only aspect that gives me pause is the naming for the python packages.

https://www.python.org/dev/peps/pep-0423/#in-doubt-use-an-individual-organization-namespace

I did a bit of reading... and I think that I like the "." instead of "-" for organization naming. Also, pre-pending "py" is a bit redundant.

I.e., instead of openmined-PyNLP, it would be openmined.nlp

Thoughts?

cereallarceny commented 4 years ago

I like this more. Let me modify the issue to use this.

So I'll remove "Py" from the release names, but not from the project or repo names.

Nilanshrajput commented 4 years ago

Avoid arbitrary names that don't explain what the project is trying to accomplish.

I agree with this but I don't think "Syft" or "SyferText" are arbitrary, "SyferText" inherently kinda represent private text, while PyNLP does not really stand out to show that the main point behind the library(Privacy), and I think it's alright to have a catchy name till it's aligned with the idea of the project.

cereallarceny commented 4 years ago

Syft is fairly arbitrary though, but I went ahead and designated it should be grandfathered in. SyferText, I can see your point, but it doesn't have any reference to Python in the name. PySyferText?

iamtrask commented 4 years ago

Awww - cool names are part of the fun!

iamtrask commented 4 years ago

Why is Torch called torch? Or Tensorflow called tensorflow?

Nilanshrajput commented 4 years ago

yup too many rules kills the fun :)

cereallarceny commented 4 years ago

Sure, but we're not Google or Facebook. I'm fine to change SyferText back. I hear your point on that, but the main goal here is for the discoverability of beginners to the community. :)

rav1kantsingh commented 4 years ago

Adding Py in TenSEAL might confuse contributors as it is not a proper python library. It is (python + C++) library with C++ 50.1%

cereallarceny commented 4 years ago

Fair enough, I'll revert that @IamRavikantSingh - good find!

madhavajay commented 4 years ago

Sorry to be late to the discussion, I have a few questions.

1) Are release names the same as import package names?

For example:

SwiftDP Repository name: SwiftDP Release name: OpenMinedSwiftDP

import OpenMinedSwiftDP

or would it be:

import SwiftDP

2) How should we reconcile TypeScript as a superset of JavaScript?

In many ways JS should be thought of as two separate targets. The first being client side code in browsers and the second being server side code in Node or Deno. Does it matter if the language IS JavaScript or if it Targets JavaScript? Would a TypeScript library be named with the .js naming conventions, or a .ts naming convention? Should we ever write any JS when we can write safer more forwards compatible TS instead?

For example we have a differential privacy TypeScript repo: https://github.com/OpenMined/differentialprivacy-ts

I would previously suggest server targets be named using a "node" naming convention but with deno looking to probably replace node, especially for TypeScript its a bit 🤷‍♂️.

3) What are our naming conventions for Objective-C?

I will be renaming the SwiftDP repo to Objective-C to reflect the fact that it contains no Swift currently and will likely only be an Objective-C wrapper of the Google C++ library.

Previously, most projects used a 2 letter prefix to namespace themselves back in the day, so that might be like OM (for OpenMined) OMDP? or ObjCDP or some other combination of different 🐪🐫CamelCase. Sometimes these things are named using the iOS moniker but in this case that would be both limiting and confusing as this library can work on other platforms and will likely take second precedence to the native SwiftDP which has already begun: https://gist.github.com/madhavajay/8f78995abd7b8578d4b4c5b283bd0b1e

While this library can be used in Swift due to the automatic bridging support between Objective-C and Swift, it is essentially just Objective-C and can also be used in Objective-C projects (also freely mixed with Swift) which is extremely common in large app code bases that heavily leverage old code or C++ code.

5) Go / Rust / R? I have recently had the opportunity to write some Go on a different project and it uses (now mandates) package org naming with fully qualified urls to source repos which makes things easy.

Example for DP might be:

module github.com/openmined/go-dp

more here: https://rakyll.org/style-packages/

I am sure Rust will come up soon, due to the fact that Rust provides a compatible C interface which makes it a highly secure and safe language to create code which targets many existing language bindings / platforms. Although 🍢kebab-case is used for many rust projects it looks like the official docs now suggest using 🐍 snake_case: https://doc.rust-lang.org/1.0.0/style/style/naming/README.html

It also seems that org namespacing is not desired? https://internals.rust-lang.org/t/namespacing-on-crates-io/8571/7

https://doc.rust-lang.org/cargo/reference/manifest.html#the-name-field The name must use only alphanumeric characters or - or , and cannot be empty. Note that cargo new and cargo init impose some additional restrictions on the package name, such as enforcing that it is a valid Rust identifier and not a keyword. crates.io imposes even more restrictions, such as enforcing only ASCII characters, not a reserved name, not a special Windows name such as "nul", is not too long, etc._

Despite all the above maybe we would do something like this?

extern crate openmined_dp_rust;

I haven't used Rust yet so maybe someone has a better suggestion?

R? This came up recently over at DP where there is discussion of an R package. Looks like they mandate dot separators: http://r-pkgs.had.co.nz/package.html

There are three formal requirements: the name can only consist of letters, numbers and periods, i.e., .; it must start with a letter; and it cannot end with a period. Unfortunately, this means you can’t use either hyphens or underscores, i.e., - or , in your package name. I recommend against using periods in package names because it has confusing connotations (i.e., file extension or S3 method)._

org.openmined.dp.r ??? or is a slash possible? org.openmined.r/dp ???

Perhaps someone with more R experience has an idea?

cereallarceny commented 4 years ago
  1. Are release names the same as import package names?

They don't have to be. That's up to how the library is developed.

  1. How should we reconcile TypeScript as a superset of JavaScript?

You shouldn't. Typescript always compiles down to Javascript, so every Typescript library is a Javascript library. Besides, it's far more common to use the .js moniker, even if the library's source code is written in Typescript.

  1. What are our naming conventions for Objective-C?

We don't have them yet because we probably won't be doing many projects in Objective-C. Swift is Apple's intended future for application development, to the detriment of Objective-C. Also, since you can import Swift modules into Objective-C and you also import C/C++ code directly into Swift projects, there's very little reason to explicitly write anything in Objective-C.

  1. Go / Rust / R?

Open to thoughts here. We don't have many of these projects yet, so I'm open to whatever standard the community provides.

madhavajay commented 4 years ago

On point 2, there is a growing host of solutions that use TypeScript natively like Deno, so non .js targets for TypeScript will probably grow in the coming years. In the end I think as long as the target of server vs browser is clear thats all that matters.

On point 3, unfortunately we can't use C++ directly in Swift (yet) which is why Objective-C is even on the discussion table, but yes I agree that this one case is a very big exception and its unlikely there will be more new Objective-C code. Never the less we currently have a repo which needs renaming or deleting so I am still wondering what I should name the Objective-C wrapper library for Google's DP. Currently I am thinking OMDP (OpenMined DP) in the NS 2 letter prefix style. Would that be okay?

On point 4, R is something we are actively exploring so knowing the naming conventions would be good. Unfortunately I have 0 R experience so its best if we asked some experienced R devs, but it would be nice to iron out any naming inconsistencies now rather than have them conflict later when we go to name them, given that each package manager has its own sometimes conflicting requirements.

For Go, its a super popular server side systems language and I believe Google has their own DP library ported to Go. Also the Android SafetyNet and Apple DeviceCheck stuff that is being used currently over at https://github.com/google/exposure-notifications-server Is written in Go, and i can't think of another Server Side language that this could be implemented in better given that the source exists and Python is probably not optimal for secure and hardened API endpoints. I would fully expect that we should look to implement something like this in a part of our stack to provide a secure working implementation of "trusted mobile device" sourced data. However the naming in Go seems fairly flexible and already has domain name / org scoping so it's probably not going to be an issue.

For Rust... again I have no experience here, but like TypeScript supersedes JS, Go supersedes C, Rust may well supplant C++, so for highly secure and performant numeric programming its not inconceivable that we would look to leverage Rust rather than C++. The naming conventions seem a little specific so it might be worth discussing now rather than later.

BTW, sorry, I am not trying to be difficult. This isn't mean't to be an attempt to enumerate a bunch of hipster languages. 😬 I can conceivably see some of these being used in some part of a larger DP / OM stack from Device to Cloud. I don't think we need to decide on any of these right now, given we don't have any lines of code for them yet, but from my quick research into their naming conventions there are some potential conflicts with the above naming scheme which are worth considering.

cereallarceny commented 4 years ago

On point 2, there is a growing host of solutions that use TypeScript natively like Deno, so non .js targets for TypeScript will probably grow in the coming years. In the end I think as long as the target of server vs browser is clear thats all that matters.

That just means that Typescript compilation is hidden from you in Deno, not that it's a separate language. Typescript is not a language, it's a superset. It's always compiled into Javascript one way or another.

unfortunately we can't use C++ directly in Swift (yet)

I don't think that's true. We're doing it in SwiftSyft without problem. Here's the first StackOverflow post I found on the issue: https://stackoverflow.com/a/35230973/591776. Seems to be perfectly possible without having Objective-C in between. Of course, if you want to put Objective-C in the middle, that's your choice, but I don't believe it's strictly necessary.

I am still wondering what I should name the Objective-C wrapper library for Google's DP. Currently I am thinking OMDP (OpenMined DP) in the NS 2 letter prefix style. Would that be okay?

No, I don't think that acronym is appropriate. Follow the naming guide above. Using that logic, the package should be:

ObjCDP

Repository name: ObjCDP Release name: OpenMinedObjCDP

Either way, I'm more in favor of not having any projects in Objective-C since they aren't really in the intended future of app development by Apple. Swift is the future, we should go with the future. While I'm not aware of your specific reason for wanting an Objective-C library, I'm not aware of any reason you can't wrap C++ code from Swift directly.


Re: your comments about Rust and Go: I'm fine for us to support those languages, but I'd like to close this issue out and start renaming these libraries personally. We don't have any libraries in Rust or Go yet, but when they inevitably come available, we can decide on the appropriate naming standard for them. It's not really necessary to have that conversation here or now. Cool?

cereallarceny commented 4 years ago

@simcof this issue has been open for debate for over 2 weeks. I think we're a bit solid here on feedback since people have had more than enough time to add their thoughts. Can we call this settled and move to start renaming projects that need it? I know that for my team, and the PSI team, this is a blocker for releasing initial versions of libraries. So I'd really like to close this out. All clear?

madhavajay commented 4 years ago

On point 2, there is a growing host of solutions that use TypeScript natively like Deno, so non .js targets for TypeScript will probably grow in the coming years. In the end I think as long as the target of server vs browser is clear thats all that matters.

That just means that Typescript compilation is hidden from you in Deno, not that it's a separate language. Typescript is not a language, it's a superset. It's always compiled into Javascript one way or another.

Ah I see, deno is still just wrapping v8. Thats a pity, I thought they were doing more than just transpiling to js again. I guess its hard to get around the monumental amount of work put into making v8 so good.

unfortunately we can't use C++ directly in Swift (yet)

I don't think that's true. We're doing it in SwiftSyft without problem. Here's the first StackOverflow post I found on the issue: https://stackoverflow.com/a/35230973/591776. Seems to be perfectly possible without having Objective-C in between. Of course, if you want to put Objective-C in the middle, that's your choice, but I don't believe it's strictly necessary.

Of course, practically every language supports 'extern "C"', but theres a lot of reasons why this creates a sub optimal amount of code maintenance for something like Swift. Interestingly, I have spoken to a few Google engineers about this and they all prefer to use C++ wrapped in Objective-C, using light wrapper objects and smart pointers. Of course cargo culting Google doesnt make things an automatically correct solution, but I imagine they have put some solid thought into it. Both are workable solutions, but there is talk of real C++ interop: https://forums.swift.org/t/manifesto-interoperability-between-swift-and-c/33874 which means no Objective-C and no wrapping function overloading and complex C++ types through extern "C" etc. Hopefully when that arrives Objective-C will cease to have any functional use.

I am still wondering what I should name the Objective-C wrapper library for Google's DP. Currently I am thinking OMDP (OpenMined DP) in the NS 2 letter prefix style. Would that be okay?

No, I don't think that acronym is appropriate. Follow the naming guide above. Using that logic, the package should be:

ObjCDP

Repository name: ObjCDP Release name: OpenMinedObjCDP

Got it. 👍

Either way, I'm more in favor of not having any projects in Objective-C since they aren't really in the intended future of app development by Apple. Swift is the future, we should go with the future. While I'm not aware of your specific reason for wanting an Objective-C library, I'm not aware of any reason you can't wrap C++ code from Swift directly.

See above. Porting a Google library, this was the suggested route from Google iOS engineers themselves. Objective-C is definitely not going anywhere, I don't expect support to leave the Xcode toolchain for years to come. However, I totally agree, it would be nice to not to use it, but if we are supporting C++ code it might actually be the better path.

Re: your comments about Rust and Go: I'm fine for us to support those languages, but I'd like to close this issue out and start renaming these libraries personally. We don't have any libraries in Rust or Go yet, but when they inevitably come available, we can decide on the appropriate naming standard for them. It's not really necessary to have that conversation here or now. Cool?

Agreed for other languages, we will name them when they happen.

AlanAboudib commented 4 years ago

@cereallarceny I am also for nameing PySyferText back to SyferText to avoid a very long name

IonesioJunior commented 4 years ago

@Benardi

youben11 commented 4 years ago

I would prefer keeping TenSEAL as is because specifying "Tensor" in the name won't even help understanding that it's about an HE library, and the most important part is "SEAL" which refers to the Microsoft HE library. Also, TenSEAL is cool :) (at least for me)

cereallarceny commented 4 years ago

Cool @AlanAboudib and @youben11 - I'll change both of your repos back to what they were called. We will still need to use the org scoping for releases though :)

cereallarceny commented 4 years ago

I've created all the appropriate issues for renaming and will be closing this issue now.