SciML / SciMLStyle

A style guide for stylish Julia developers
https://docs.sciml.ai/SciMLStyle/stable/
MIT License
216 stars 18 forks source link
julia ode scientific-machine-learning sciml style

SciML Style Guide for Julia

SciML Code Style Global Docs

The SciML Style Guide is a style guide for the Julia programming language. It is used by the SciML Open Source Scientific Machine Learning Organization. It covers proper styles to allow for easily high-quality, readable, robust, safety, and fast code that is easy to maintain for production and deployment.

It is open to discussion with the community. Please file an issue or open a PR to discuss changes to the style guide.

Table of Contents

Code Style Badge

Let contributors know your project is following the SciML Style Guide by adding the badge to your README.md.

[![SciML Code Style](https://img.shields.io/static/v1?label=code%20style&message=SciML&color=9558b2&labelColor=389826)](https://github.com/SciML/SciMLStyle)

Overarching Dogmas of the SciML Style

Consistency vs Adherence

According to PEP8:

A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is the most important.

But most importantly: know when to be inconsistent -- sometimes the style guide just doesn't apply. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don't hesitate to ask!

Some code within the SciML organization is old, on life support, donated by researchers to be maintained. Consistency is the number one goal, so updating to match the style guide should happen on a repo-by-repo basis, i.e. do not update one file to match the style guide (leaving all other files behind).

Community Contribution Guidelines

For a comprehensive set of community contribution guidelines, refer to ColPrac. A relevant point to highlight PRs should do one thing. In the context of style, this means that PRs that update the style of a package's code should not be mixed with fundamental code contributions. This separation makes it easier to ensure that large style improvements are isolated from substantive (and potentially breaking) code changes.

Open source contributions are allowed to start small and grow over time

If the standard for code contributions is that every PR needs to support every possible input type that anyone can think of, the barrier would be too high for newcomers. Instead, the principle is to be as correct as possible to begin with, and grow the generic support over time. All recommended functionality should be tested, and any known generality issues should be documented in an issue (and with a @test_broken test when possible). However, a function that is known to not be GPU-compatible is not grounds to block merging, rather it is encouraged for a follow-up PR to improve the general type support!

Generic code is preferred unless code is known to be specific

For example, the code:

function f(A, B)
    for i in 1:length(A)
        A[i] = A[i] + B[i]
    end
end

would not be preferred for two reasons. One is that it assumes A uses one-based indexing, which would fail in cases like OffsetArrays and FFTViews. Another issue is that it requires indexing, while not all array types support indexing (for example, CuArrays). A more generic and compatible implementation of this function would be to use broadcast, for example:

function f(A, B)
    @. A = A + B
end

which would allow support for a wider variety of array types.

Internal types should match the types used by users when possible

If f(A) takes the input of some collections and computes an output from those collections, then it should be expected that if the user gives A as an Array, the computation should be done via Arrays. If A was a CuArray, then it should be expected that the computation should be internally done using a CuArray (or appropriately error if not supported). For these reasons, constructing arrays via generic methods, like similar(A), is preferred when writing f instead of using non-generic constructors like Array(undef,size(A)) unless the function is documented as being non-generic.

Trait definition and adherence to generic interface is preferred when possible

Julia provides many different interfaces, for example:

Those interfaces should be followed when possible. For example, when defining broadcast overloads, one should implement a BroadcastStyle as suggested by the documentation instead of simply attempting to bypass the broadcast system via copyto! overloads.

When interface functions are missing, these should be added to Base Julia or an interface package, like ArrayInterface.jl. Such traits should be declared and used when appropriate. For example, if a line of code requires mutation, the trait ArrayInterface.ismutable(A) should be checked before attempting to mutate, and informative error messages should be written to capture the immutable case (or, an alternative code that does not mutate should be given).

One example of this principle is demonstrated in the generation of Jacobian matrices. In many scientific applications, one may wish to generate a Jacobian cache from the user's input u0. A naive way to generate this Jacobian is J = similar(u0,length(u0),length(u0)). However, this will generate a Jacobian J such that J isa Matrix.

Macros should be limited and only be used for syntactic sugar

Macros define new syntax, and for this reason, they tend to be less composable than other coding styles and require prior familiarity to be easily understood. One principle to keep in mind is, "can the person reading the code easily picture what code is being generated?". For example, a user of Soss.jl may not know what code is being generated by:

@model (x, α) begin
    σ ~ Exponential()
    β ~ Normal()
    y ~ For(x) do xj
        Normal(α + β * xj, σ)
    end
    return y
end

and thus using such a macro as the interface is not preferred when possible. However, a macro like @muladd is trivial to picture on a code (it recursively transforms a*b + c to muladd(a,b,c) for more accuracy and efficiency), so using such a macro, for example:

julia> @macroexpand(@muladd k3 = f(t + c3 * dt, @. uprev + dt * (a031 * k1 + a032 * k2)))
:(k3 = f((muladd)(c3, dt, t), (muladd).(dt, (muladd).(a032, k2, (*).(a031, k1)), uprev)))

is recommended. Some macros in this category are:

Some performance macros, like @simd, @threads, or @turbo from LoopVectorization.jl, make an exception in that their generated code may be foreign to many users. However, they still are classified as appropriate uses as they are syntactic sugar since they do (or should) not change the behavior of the program in measurable ways other than performance.

Errors should be caught as high as possible, and error messages should be contextualized for newcomers

Whenever possible, defensive programming should be used to check for potential errors before they are encountered deeper within a package. For example, if one knows that f(u0,p) will error unless u0 is the size of p, this should be caught at the start of the function to throw a domain specific error, for example "parameters and initial condition should be the same size".

This contextualization should result in error messages that use terminology related to the user facing API (vs. referencing internal implementation details). Ideally, such error messages should not only describe the issue in language that will be familiar to the user but also include suggestions, where possible, of how to correct the issue.

Subpackaging and interface packages is preferred over conditional modules via Requires.jl

Requires.jl should be avoided at all costs. If an interface package exists, such as ChainRulesCore.jl for defining automatic differentiation rules without requiring a dependency on the whole ChainRules.jl system, or RecipesBase.jl which allows for defining Plots.jl plot recipes without a dependency on Plots.jl, a direct dependency on these interface packages is preferred.

Otherwise, instead of resorting to a conditional dependency using Requires.jl, it is preferred to create subpackages, i.e. smaller independent packages kept within the same Github repository with independent versioning and package management. An example of this is seen in Optimization.jl which has subpackages like OptimizationBBO.jl for BlackBoxOptim.jl support.

Some important interface packages to know about are:

Functions should either attempt to be non-allocating and reuse caches, or treat inputs as immutable

Mutating codes and non-mutating codes fall into different worlds. When a code is fully immutable, the compiler can better reason about dependencies, optimize the code, and check for correctness. However, many times a code that makes the fullest use of mutation can outperform even what the best compilers of today can generate. That said, the worst of all worlds is when code mixes mutation with non-mutating code. Not only is this a mishmash of coding styles, but it also has the potential non-locality and compiler proof issues of mutating code while not fully benefiting from the mutation.

Out-Of-Place and Immutability is preferred when sufficient performant

Mutation is used to get more performance by decreasing the number of heap allocations. However, if it's not helpful for heap allocations in a given spot, do not use mutation. Mutation is scary and should be avoided unless it gives an immediate benefit. For example, if matrices are sufficiently large, then A*B is as fast as mul!(C,A,B), and thus writing A*B is preferred (unless the rest of the function is being careful about being fully non-allocating, in which case this should be mul! for consistency).

Similarly, when defining types, using struct is preferred to mutable struct unless mutating the struct is a common occurrence. Even if mutating the struct is a common occurrence, see whether using Setfield.jl is sufficient. The compiler will optimize the construction of immutable structs, and thus this can be more efficient if it's not too much of a code hassle.

Tests should attempt to cover a wide gamut of input types

Code coverage numbers are meaningless if one does not consider the input types. For example, one can hit all the code with Array, but that does not test whether CuArray is compatible! Thus it's always good to think of coverage not in terms of lines of code but in terms of type coverage. A good list of number types to think about are:

Array types to think about testing are:

When in doubt, a submodule should become a subpackage or separate package

Keep packages focused on one core idea. If there's something separate enough to be a submodule, could it instead be a separate, well-tested and documented package to be used by other packages? Most likely yes.

Globals should be avoided whenever possible

Global variables should be avoided whenever possible. When required, global variables should be constants and have an all uppercase name separated with underscores (e.g. MY_CONSTANT). They should be defined at the top of the file, immediately after imports and exports but before an __init__ function. If you truly want mutable global style behavior you may want to look into mutable containers.

Type-stable and Type-grounded code is preferred wherever possible

Type-stable and type-grounded code helps the compiler create not only more optimized code, but also faster to compile code. Always keep containers well-typed, functions specializing on the appropriate arguments, and types concrete.

Closures should be avoided whenever possible

Closures can cause accidental type instabilities that are difficult to track down and debug; in the long run, it saves time to always program defensively and avoid writing closures in the first place, even when a particular closure would not have been problematic. A similar argument applies to reading code with closures; if someone is looking for type instabilities, this is faster to do when code does not contain closures. Furthermore, if you want to update variables in an outer scope, do so explicitly with Refs or self defined structs. For example,

map(Base.Fix2(getindex, i), vector_of_vectors)

is preferred over

map(v -> v[i], vector_of_vectors)

or

[v[i] for v in vector_of_vectors]

Numerical functionality should use the appropriate generic numerical interfaces

While you can use A\b to do a linear solve inside a package, that does not mean that you should. This interface is only sufficient for performing factorizations, and so that limits the scaling choices, the types of A that can be supported, etc. Instead, linear solves within packages should use LinearSolve.jl. Similarly, nonlinear solves should use NonlinearSolve.jl. Optimization should use Optimization.jl. Etc. This allows the full generic choice to be given to the user without depending on every solver package (effectively recreating the generic interfaces within each package).

Functions should capture one underlying principle

Functions mean one thing. Every dispatch of + should be "the meaning of addition on these types". While in theory you could add dispatches to + that mean something different, that will fail in generic code for which + means addition. Thus, for generic code to work, code needs to adhere to one meaning for each function. Every dispatch should be an instantiation of that meaning.

Internal choices should be exposed as options whenever possible

Whenever possible, numerical values and choices within scripts should be exposed as options to the user. This promotes code reusability beyond the few cases the author may have expected.

Prefer code reuse over rewrites whenever possible

If a package has a function you need, use the package. Add a dependency if you need to. If the function is missing a feature, prefer to add that feature to said package and then add it as a dependency. If the dependency is potentially troublesome, for example because it has a high load time, prefer to spend time helping said package fix these issues and add the dependency. Only when it does not seem possible to make the package "good enough" should using the package be abandoned. If it is abandoned, consider building a new package for this functionality as you need it, and then make it a dependency.

Prefer to not shadow functions

Two functions can have the same name in Julia by having different namespaces. For example, X.f and Y.f can be two different functions, with different dispatches, but the same name. This should be avoided whenever possible. Instead of creating MyPackage.sort, consider adding dispatches to Base.sort for your types if these new dispatches match the underlying principle of the function. If it doesn't, prefer to use a different name. While using MyPackage.sort is not conflicting, it is going to be confusing for most people unfamiliar with your code, so MyPackage.special_sort would be more helpful to newcomers reading the code.

Avoid unmaintained dependencies

Packages should only be depended on if they have maintainers who are responsive. Good code requires good communities. If maintainers do not respond to breakage within 2 weeks with multiple notices, then all dependencies from that organization should be considered for removal. Note that some issues may take a long time to fix, so it may take more time than 2 weeks to fix, it's simply that the communication should be open, consistent, and timely.

Avoid unsafe operations

Like other high-level languages that provide strong safety guarantees by default, Julia nevertheless has a small set of operations that bypass normal checks. These operations are clearly marked with the prefix unsafe_. By using an “unsafe” operation, the programmer asserts that they know the operation is valid even though the language cannot automatically ensure it. For high reliability these constructs should be avoided or carefully inspected during code review. They are:

Avoid non public operations in Julia Base and packages

The Julia standard library and packages developed in the Julia programming language have an intended public API indicated by marking symbols with the export keyword or in v1.11+ with the new public keyword. However, it is possible to use non public names via explicit qualification, e.g. Base.foobar. This practice is not necessarily unsafe, but should be avoided since non public operations may have unexpected invariants and behaviors, and are subject to changes in future releases of the language.

Note that qualified names are commonly used in method definitions to clarify that a function is being extended, e.g. function Base.getindex(...) … end. Such uses do not fall under this concern.

Always default to constructs which initialize data

For certain newly-allocated data structures, such as numeric arrays, the Julia compiler and runtime do not check whether data is accessed before it has been initialized. Therefore such data structures can “leak” information from one part of a program to another. Uninitialized structures should be avoided in favor of functions like zeros and fill that create data with well-defined contents. If code does allocate uninitialized memory, it should ensure that this memory is fully initialized before being returned from the function in which it is allocated.

Constructs which create uninitialized memory should only be used if there is a demonstrated performance impact and it should ensure that all memory is initialized in the same function in which the array is intended to be used.

Example:

function make_array(n::Int)
    A = Vector{Int}(undef, n)
    # function body
    return A
end

This function allocates an integer array with undefined initial contents (note the language forces you to request this explicitly). A code reviewer should ensure that the function body assigns every element of the array. One can similarly create structs with undefined fields, and if used this way, one should ensure all fields are initialized:

struct Foo
  x::Int
  Foo() = new()
end

julia> Foo().x
139736495677280

Use extra precaution when running external processes

The Julia standard library contains a run function and other facilities for running external processes. Any program that does this is only as safe as the external process it runs. If this cannot be avoided, then best practices for using these features are:

  1. Only run fixed, known executables, and do not derive the path of the executable to run from user input or other outside sources.
  2. Make sure the executables used have also passed required audit procedures.
  3. Make sure to handle process failure (non-zero exit code).
  4. If possible, run external processes in a sandbox or “jail” environment with access only to what they need in terms of files, file handles and ports.

When run in a sandbox or jail, external processes can actually improve security since the subprocess is isolated from the rest of the system by the kernel.

Avoid eval whenever possible

Julia contains an eval function that executes program expressions constructed at run time. This is not in itself unsafe, but because the code it will run is not textually evident in the surrounding program, it can be difficult to determine what it will do. For example, a Julia program could construct and eval an expression that performs an unsafe operation without the operation being clearly evident to a code reviewer or analysis tool.

In general, programs should try to avoid using eval in ways that are influenced by user input because there are many subtle ways this can lead to arbitrary code execution. If user input must influence eval, the input should only be used to select from a known list of possible behaviors. Approaches using pattern matching to try to validate expressions should be viewed with extreme suspicion because they tend to be brittle and/or exploitable.

Note: it is common for Julia programs to invoke eval or @eval at the top level, in order to generate global definitions programmatically. Such uses are generally safe.

Avoid bounds check removal, and if done, add appropriate manual checks

While Julia checks the bounds of all array operations by default, it is possible to manually disable bounds checks in a program using @inbounds. Note that in early versions of Julia (pre v1.9) this could be used as a performance optimization, but in later versions it can demonstrably reduce performance and thus one should never immediately default to bounds check removal as a performance habit.

Uses of this construct should be carefully audited during code review. For maximum safety, it should be avoided or programs should be run with the command line option --check-bounds=yes to enable all checks regardless of manual annotations.

To check a use of @inbounds for correctness, it suffices to examine all array indexing expressions (e.g. a[i]) within the expression it applies to, and ensure that each index will always be within the bounds of the indexed array. For example the following common use pattern is valid:

@inbounds for i in eachindex(A)
    A[i] = i
end

By inspection, the variable i will always be a valid index for A.

For contrast, the following use is invalid unless A is known to be a specific type (eg: Vector)

@inbounds for i in 1:length(A)
    A[i] = i
end

@inbounds should be applied to as narrow a region of code as possible. When applied to a large block of code, it can be difficult to identify and verify all indexing expressions.

Avoid ccall unless necessary, and use safe ccall practices when required

Calling C (and Fortran) libraries from Julia is very easy: the ccall syntax (and the more convenient @ccall macro) allow calling C libraries without any need for glue files or boilerplate. They do require caution, however: the programmer tells Julia what the signature of each library function is and if this is not done correctly, it can be the cause of crashes and thus security vulnerabilities. An exploit is just a crash that an attacker has arranged to fail in a worse way than it would have randomly.

Safe use of ccall depends on both automated and manual measures.

What Julia does (automated):

What you must do (manual):

Validate all user inputs to avoid code injection

When writing programs that construct any kind of code based on user input, extra caution is required and the user input must be validated or escaped. For example, a common type of attack in web applications written in all programming languages is SQL injection: a user input is spliced into an SQL query to construct a customized query based on the user’s input. If raw user input is spliced into an SQL query as a string, it is easy to craft inputs that will execute arbitrary SQL commands, including destructive ones or ones that will reveal private data to an attacker. To prevent this, the user input should be passed as parameters to SQL prepared statements; a package such as SqlStrings.jl can be used to do this without a syntax burden. This protects against malicious input, but also encourages systematic marshaling of Julia types into SQL types. If string interpolation must be used, all user input should be either validated to match a strict, safe pattern (e.g. only consists of decimal digits or ASCII letters), or it should be escaped to ensure that SQL treats it only as data, not as code (e.g. turn a user input into an escaped string literal).

While we have talked specifically about SQL here, this issue is not limited to SQL. The same concern occurs when executing programs via shells, for example. Julia is more secure than most programming languages in this respect because the default mechanism for running external code (see Cmd objects in the Julia manual) is carefully designed to not be susceptible to this kind of injection, but programmers may be tempted to use a shell to call external code for convenience sake. The fact that a shell must be explicitly invoked in Julia helps catch these kinds of circumstances. Using a shell like this is usually a bad idea and can typically be avoided. If an external shell must be used, be certain that any user data used to construct the shell command is carefully validated or escaped to avoid shell injection attacks.

Ensure secure random number generators are used when required

The default pseudo-random number generator in Julia, which can be accessed by calling rand() and randn(), for example, is intended for simulation purposes and not for applications requiring cryptographic security. An attacker can, by observing a series of random values, construct its internal state and predict future pseudo-random values. For security-sensitive applications like generating secret values used for authentication, the RandomDevice() random-number generator should be used. This produces genuinely random numbers which cannot be predicted.

Be aware of distributed computing encryption principles

Julia’s distributed computing uses unencrypted TCP/IP sockets for communication by default and expects to be running on a fully trusted cluster. If using Distributed in your code through @distributed, pmap, etc., be aware that the communication channels are not encrypted. Julia opens ports for communication between processes in a distributed cluster. A pre-generated random cookie is necessary to successfully connect (Julia 0.5 onwards), which defeats arbitrary external connections. This mechanism is described in detail in https://github.com/JuliaLang/julia/pull/16292.

For additional security, these communication channels can be encrypted through the use of a custom ClusterManager which enables SSH port forwarding, or uses some other mechanism to encrypt the communication channel.

Always immediately flush secret data after handling

When it’s necessary to manage secret data (for example, a user’s password) it’s desirable to have this erased from memory immediately after finishing with the data. However, when a normal String or Array is used as a container for such data, the underlying bytes persist after the container is deallocated and in principle could be recovered by an attacker at a later time. There are also situations where a string or array may be implicitly copied, for example, if it is assigned to a location with a compatible but different type, it will be converted, thus creating a copy of the original data. Normally this is harmless and convenient, but making copies of secrets is obviously bad for security. To prevent this, Julia provides the type Base.SecretBuffer and a shred! function which should be called immediately after the data is finished with. The contents of a SecretBuffer must be explicitly extracted — it is never implicitly copied — and its contents will be automatically shredded upon garbage collection of the SecretBuffer object if the shred! function was never called on it, with a warning indicating that the buffer should have been explicitly shredded by the programmer.

Specific Rules

High Level Rules

General Naming Principles

Comments

# Yes:

# Number of nodes to predict. Again, an issue with the workflow order. Should be updated
# after data is fetched.
p = 1

# No:

p = 1  # Number of nodes to predict. Again, an issue with the workflow order. Should be
# updated after data is fetched.

Modules

# Yes:
import A: a
import C

using B
using D: d

# No:
import A: a
using B
import C
using D: d
# Yes:
using A, B, C, D

# No:
using A
using B
using C
using D

# No:
using A,
      B,
      C,
      D

Functions

# Yes:
foo(x::Int64) = abs(x) + 3

# No:
foobar(array_data::AbstractArray{T}, item::T) where {T <: Int64} = T[
    abs(x) * abs(item) + 3 for x in array_data
]
# Yes
function my_large_function(argument1, argument2,
                           argument3, argument4,
                           argument5, x, y, z)

# No
function my_large_function(argument1,
                           argument2,
                           argument3,
                           argument4,
                           argument5,
                           x,
                           y,
                           z)

Function Argument Precedence

  1. Function argument. Putting a function argument first permits the use of do blocks for passing multiline anonymous functions.

  2. I/O stream. Specifying the IO object first permits passing the function to functions such as sprint, e.g. sprint(show, x).

  3. Input being mutated. For example, in fill!(x, v), x is the object being mutated and it appears before the value to be inserted into x.

  4. Type. Passing a type typically means that the output will have the given type. In parse(Int, "1"), the type comes before the string to parse. There are many such examples where the type appears first, but it's useful to note that in read(io, String), the IO argument appears before the type, which is in keeping with the order outlined here.

  5. Input not being mutated. In fill!(x, v), v is not being mutated and it comes after x.

  6. Key. For associative collections, this is the key of the key-value pair(s). For other indexed collections, this is the index.

  7. Value. For associative collections, this is the value of the key-value pair(s). In cases like fill!(x, v), this is v.

  8. Everything else. Any other arguments.

  9. Varargs. This refers to arguments that can be listed indefinitely at the end of a function call. For example, in Matrix{T}(undef, dims), the dimensions can be given as a Tuple, e.g. Matrix{T}(undef, (1,2)), or as Varargs, e.g. Matrix{T}(undef, 1, 2).

  10. Keyword arguments. In Julia keyword arguments have to come last anyway in function definitions; they're listed here for the sake of completeness.

The vast majority of functions will not take every kind of argument listed above; the numbers merely denote the precedence that should be used for any applicable arguments to a function.

Tests and Continuous Integration

@time @safetestset "Jacobian Tests" include("interface/jacobian_tests.jl")

Whitespace

NamedTuples

The = character in NamedTuples should be spaced as in keyword arguments. Space should be put between the name and its value. The empty NamedTuple should be written NamedTuple() not (;)

# Yes:
xy = (x = 1, y = 2)
x = (x = 1,)  # Trailing comma required for correctness.
x = (; kwargs...)  # Semicolon required to splat correctly.

# No:
xy = (x=1, y=2)
xy = (;x=1,y=2)

Numbers

# Yes:
0.1
2.0
3.0f0

# No:
.1
2.
3.f0

Ternary Operator

Ternary operators (?:) should generally only consume a single line. Do not chain multiple ternary operators. If chaining many conditions, consider using an if-elseif-else conditional, dispatch, or a dictionary.

# Yes:
foobar = foo == 2 ? bar : baz

# No:
foobar = foo == 2 ?
    bar :
    baz
foobar = foo == 2 ? bar : foo == 3 ? qux : baz

As an alternative, you can use a compound boolean expression:

# Yes:
foobar = if foo == 2
    bar
else
    baz
end

foobar = if foo == 2
    bar
elseif foo == 3
    qux
else
    baz
end

For loops

For loops should always use in, never = or . This also applies to list and generator comprehensions

# Yes
for i in 1:10
    #...
end

[foo(x) for x in xs]

# No:
for i = 1:10
    #...
end

[foo(x) for x ∈ xs]

Function Type Annotations

Annotations for function definitions should be as general as possible.

# Yes:
splicer(arr::AbstractArray, step::Integer) = arr[begin:step:end]

# No:
splicer(arr::Array{Int}, step::Int) = arr[begin:step:end]

Using as many generic types as possible allows for a variety of inputs and allows your code to be more general:

julia> splicer(1:10, 2)
1:2:9

julia> splicer([3.0, 5, 7, 9], 2)
2-element Array{Float64,1}:
 3.0
 7.0

Struct Type Annotations

Annotations on type fields need to be given a little more thought, since field access is not concrete unless the compiler can infer the type (see type-dispatch design for details). Since well-inferred code is preferred, abstract type annotations, i.e.

mutable struct MySubString <: AbstractString
    string::AbstractString
    offset::Integer
    endof::Integer
end

are not recommended. Instead a concretely-typed struct:

mutable struct MySubString <: AbstractString
    string::String
    offset::Int
    endof::Int
end

is preferred. If generality is required, then parametric typing is preferred, i.e.:

mutable struct MySubString{T<:Integer} <: AbstractString
    string::String
    offset::T
    endof::T
end

Untyped fields should be explicitly typed Any, i.e.:

struct StructA
    a::Any
end

Macros

Yes:
@parameters a = b
@parameters a=b c=d

No:
@parameters a = b c = d

Types and Type Annotations

Package version specifications

# Yes:
DataFrames = "0.17"

# No:
DataFrames = "^0.17"

Documentation

"""
    bar(x[, y])

Compute the Bar index between `x` and `y`. If `y` is missing, compute the Bar index between
all pairs of columns of `x`.
"""
function bar(x, y) ...

Type Template (should be skipped if it is redundant with the constructor(s) docstring):

"""
    MyArray{T, N}

My super awesome array wrapper!

# Fields
- `data::AbstractArray{T, N}`: stores the array being wrapped
- `metadata::Dict`: stores metadata about the array
"""
struct MyArray{T, N} <: AbstractArray{T, N}
    data::AbstractArray{T, N}
    metadata::Dict
end

Function Template (only required for exported functions):

"""
    mysearch(array::MyArray{T}, val::T; verbose = true) where {T} -> Int

Searches the `array` for the `val`. For some reason we don't want to use Julia's
builtin search :)

# Arguments
- `array::MyArray{T}`: the array to search
- `val::T`: the value to search for

# Keywords
- `verbose::Bool = true`: print out progress details

# Returns
- `Int`: the index where `val` is located in the `array`

# Throws
- `NotFoundError`: I guess we could throw an error if `val` isn't found.
"""
function mysearch(array::AbstractArray{T}, val::T) where {T}
    ...
end
"""
    Manager(args...; kwargs...) -> Manager

A cluster manager which spawns workers.

# Arguments

- `min_workers::Integer`: The minimum number of workers to spawn or an exception is thrown
- `max_workers::Integer`: The requested number of workers to spawn

# Keywords

- `definition::AbstractString`: Name of the job definition to use. Defaults to the
    definition used within the current instance.
- `name::AbstractString`: ...
- `queue::AbstractString`: ...
"""
function Manager(...)
    ...
end
"""
    Manager(max_workers; kwargs...)
    Manager(min_workers:max_workers; kwargs...)
    Manager(min_workers, max_workers; kwargs...)

A cluster manager which spawns workers.

# Arguments

- `min_workers::Int`: The minimum number of workers to spawn or an exception is thrown
- `max_workers::Int`: The requested number of workers to spawn

# Keywords

- `definition::AbstractString`: Name of the job definition to use. Defaults to the
    definition used within the current instance.
- `name::AbstractString`: ...
- `queue::AbstractString`: ...
"""
function Manager end
"""
...

# Keywords
- `definition::AbstractString`: Name of the job definition to use. Defaults to the
    definition used within the current instance.
"""

Error Handling

Arrays

Line Endings

Always use Unix style \n line ending.

VS-Code Settings

If you are a user of VS Code we recommend that you have the following options in your Julia syntax specific settings. To modify these settings, open your VS Code Settings with CMD+, (Mac OS) or CTRL+, (other OS), and add to your settings.json:

{
    "[julia]": {
        "editor.detectIndentation": false,
        "editor.insertSpaces": true,
        "editor.tabSize": 4,
        "files.insertFinalNewline": true,
        "files.trimFinalNewlines": true,
        "files.trimTrailingWhitespace": true,
        "editor.rulers": [92],
        "files.eol": "\n"
    },
}

Additionally, you may find the Julia VS-Code plugin useful.

JuliaFormatter

Note: the sciml style is only available in JuliaFormatter v1.0 or later

One can add .JuliaFormatter.toml with the content

style = "sciml"

in the root of a repository, and run

using JuliaFormatter, SomePackage
format(joinpath(dirname(pathof(SomePackage)), ".."))

to format the package automatically.

Add FormatCheck.yml to enable the formatting CI. The CI will fail if the repository needs additional formatting. Thus, one should run format before committing.

References

Many of these style choices were derived from the Julia style guide, the YASGuide, and the Blue style guide. Additionally, many tips and requirements from the JuliaHub Secure Coding Practices manual were incorporated into this style.