API consistency review - Githubissues

StefanKarpinski commented 7 years ago

I'm starting this as a place to leave notes about things to make sure to consider when checking for API consistency in Julia 1.0.

[x] Convention prioritization. Listing and prioritizing our what-comes-first conventions in terms of function arguments for do-blocks, IO arguments for functions that print, outputs for in-place functions, etc (https://github.com/JuliaLang/julia/issues/19150).
[ ] Positional vs keyword arguments. Long ago we didn't have keyword arguments. They're still sometimes avoided for performance considerations. We should make this choice based on what makes the best API, not on that kind of historical baggage (keyword performance issues should also be addressed so that this is no longer a consideration).
[ ] Metaprogramming tools. We have a lot of tools like @code_xxx that are paired with underlying functions like code_xxx. These should behave consistently: similar signatures, if there are functions with similar signatures, make sure they have similar macro versions. Ideally, they should all return values, rather than some returning values and others printing results, although that might be hard for things like LLVM code and assembly code.
[ ] IO <=> file name equivalence. We generally allow file names as strings to be passed in place of IO objects and the standard behavior is to open the file in the appropriate mode, pass the resulting IO object to the same function with the same arguments, and then ensure that the IO object is closed afterwards. Verify that all appropriate IO-accepting functions follow this pattern.
[ ] Reducers APIs. Make sure reducers have consistent behaviors – all take a map function before reduction; congruent dimension arguments, etc.
[ ] Dimension arguments. Consistent treatment of "calculate across this [these] dimension[s]" input arguments, what types are allowed etc, consider whether doing these as keyword args might be desired.
[ ] Mutating/non-mutating pairs. Check that non-mutating functions are paired with mutating functions where it makes sense and vice versa.
[ ] Tuple vs. vararg. Check that there is general consistency between whether functions take a tuple as the last argument or a vararg.
[ ] Unions vs. nullables vs. errors. Consistent rules on when functions should throw errors, and when they should return Nullables or Unions (e.g. parse/tryparse, match, etc.).
[ ] Support generators as widely as possible. Make sure any function that could sensibly work with generators does so. We're pretty good about this already, but I'm guessing we've missed a few.
[ ] Output type selection. Be consistent about whether "output type" API's should be in terms of element type or overall container type (ref #11557 and #16740).
[x] Pick a name. There are a few functions/operators with aliases. I think this is fine in cases where one of the names is non-ASCII and the ASCII version is provided so people can still write pure-ASCII code, but there are also cases like <: which is an alias for issubtype where both names are ASCII. We should pick one and deprecated the other. We deprecated is in favor of === and should do similarly here.
[ ] Consistency with DataStructures. It's somewhat beyond the scope of Base Julia, but we should make sure that all of collections in DataStructures have consistent APIs with those provided by Base. The connection in the other direction is that some of those types may inform how we end up designing the APIs in Base since we want them to extend smoothly and consistently.
[ ] NaNs vs. DomainErrors. See https://github.com/JuliaLang/julia/issues/5234 – have a policy for when to do which and make sure it is followed consistently.
[ ] Collection <=> generator. Sometimes you want a collection, sometimes you want a generator. We should go through all our APIs and make sure there's an option for both where it makes sense. Once upon a time, there was a convention to use an uppercase name for the generator version and a lowercase name for the version that's eager and returns a new collection. But no one ever paid any attention to that, so maybe we need a new convention.
[ ] Higher order functions on associatives. Currently some higher order functions iterate over associative collections with signature (k,v) – e.g. map, filter. Others iterate over pairs, i.e. with signature kv, requiring the body to explicitly destructure the pair into k and v – e.g. all, any. This should be reviewed and made consistent.
[x] Convert vs. construct. Allow conversion where appropriate. E.g. there have been multiple issues/questions about convert(String, 'x'). In general, conversion is appropriate when there is a single canonical transformation. Conversion of strings into numbers in general isn't appropriate because there are many textual ways to represent numbers, so we need to parse instead, with options. There's a single canonical way to represent version numbers as strings, however, so we may convert those. We should apply this logic carefully and universally.
[ ] Review completeness of collections API. We should look at the standard library functions for collections provided by other languages and make sure we have a way of expressing the common operations they have. For example, we don't have a flatten function or a concat function. We probably should.
[ ] Underscore audit.

ararslan commented 7 years ago

Apologies if this isn't the appropriate place to mention this, but it would be nice to be more consistent with underscores in function names going forward.

StefanKarpinski commented 7 years ago

No, this is a good place for that. And yes, we should strive to eliminate all names where underscores are necessary :)

tkelman commented 7 years ago

consistent treatment of "calculate across this [these] dimension[s]" input arguments, what types are allowed etc, consider whether doing these as keyword args might be desired
listing and prioritizing our what-comes-first conventions in terms of function arguments for do-blocks, IO arguments for functions that print, outputs for in-place functions, etc (edit: thought there might already be one open for this)

ararslan commented 7 years ago

For @tkelman's second point, see https://github.com/JuliaLang/julia/issues/19150

ararslan commented 7 years ago

There was also a recent Julep regarding the API for find and related functions: https://github.com/JuliaLang/Juleps/blob/master/Find.md

shashi commented 7 years ago

Should we deprecate put! and take! on channels (and maybe do the same for futures) since we have push! and shift! on them? Just suggesting removing 2 redundant words in the API.

I am suspicious of shift! being user friendly. A candidate is fetch! we already have fetch which is the non-mutating version of take!

ref #13538 #12469

@amitmurthy @malmaud

Edit: It would even make sense to reuse send and recv on channels. (I'm surprised that these are only used for UDPSockets at the moment)

amitmurthy commented 7 years ago

+1 for replacing put!/take! with push!/fetch!

nalimilan commented 7 years ago

I'll add renaming @inferred to @test_inferred.

martinholters commented 7 years ago

Double-check that specializations are consistent with the more generic functions, i.e. not something like #20233.

dpsanders commented 7 years ago

Review all exported functions to check if any can be eliminated by replacing them with multiple dispatch, e.g. print_with_color

StefanKarpinski commented 7 years ago

The typical pairing is push! and shift! when working with a queue-like data structure.

StefanKarpinski commented 7 years ago

If we're not going to use the typical name pairing for this kind of data structure because we're worried that the operation entails communication overhead that isn't adequately conveyed by those names, then I don't think push! makes sense either. send and recv really might be better.

malmaud commented 7 years ago

Maybe double-check that there is general consistency between whether functions take a tuple as the last argument or a vararg.

simonbyrne commented 7 years ago

Perhaps too big for this issue, but it would be good to have consistent rules on when functions should throw errors, and when they should return Nullables or Unions (e.g. parse/tryparse, match, etc.)

StefanKarpinski commented 7 years ago

No issue too big, @simonbyrne – this is the laundry list.

StefanKarpinski commented 7 years ago

Btw: this isn't really for specific changes (e.g. renaming specific functions) – it's more about kinds of things we can review. For specific proposed changes, just open an issue proposing that change.

bramtayl commented 7 years ago

We have a lot of tools like @code_xxx that are paired with underlying functions like code_xxx

Not sure if this is what you're talking about, but see CreateMacrosFrom.jl

tkelman commented 7 years ago

Whether "output type" API's should be in terms of element type or overall container type (ref #11557 and #16740)

dpsanders commented 7 years ago

Document all exported functions (including doctests)

pkofod commented 7 years ago

Document all exported functions (including doctests)

if this is part of this, then maybe also: remember to label your tests with the issue/pr number. It makes it a lot easier to understand why that test is there. I know how git blame works, but when adding testsets (just to give an example) it's sometimes a bit of a mystery what is being tested, and it would be great if the issue/pr number was always there.

stevengj commented 7 years ago

@dpsanders: and exported macros! e.g. @fastmath has no docstring.

amellnik commented 7 years ago

This is very minor, but the string and Symbol functions do almost the same thing and have different capitalization. ~~I think symbol would make more sense.~~

ararslan commented 7 years ago

@amellnik The difference is that Symbol is a type constructor and string is a regular function. IIRC we used to have symbol but it was deprecated in favor of the type constructor. I'm not convinced a change is necessary for this, but if anything I think we should use the String constructor in place of string.

yuyichao commented 7 years ago

if anything I think we should use the String constructor in place of string.

No, they are different functions and shouldn't be merged

julia> String(UInt8[])
""

julia> string(UInt8[])
"UInt8[]"

jrevels commented 7 years ago

No, they are different functions and shouldn't be merged

This looks like a situation where string(args...) should just be deprecated in favor of sprint(print, args...), then - having both string and String is confusing. We could specialize on sprint(::typeof(print), args...) to recover any lost performance. Along these lines, it might also make sense to deprecate repr(x) for sprint(showall, args...).

yuyichao commented 7 years ago

That sounds ok although calling string to turn something into a string seems pretty standard....

ararslan commented 7 years ago

calling string to turn something into a string seems pretty standard

Yes, but that's where the disconnect between String and string comes in.

TotalVerb commented 7 years ago

sprint(print, ...) feels redundant. If we get rid of string, we can rename sprint to string so we get string(print, foo) and string(showall, foo) which reads well in my opinion.

JeffBezanson commented 7 years ago

This might be a case where consistency is overrated. I think it's fine to have string(x) for "just give me a string representation of x". If it's going to be more complicated than that, e.g. requiring you to specify which printing function to use, then using another name like sprint makes sense.

It would also be ok with me to rename String(UInt8[]) to something else, and use String instead of string. string gives us a bit more flexibility in the future to change what type of string we return, but that doesn't seem likely to happen.

TotalVerb commented 7 years ago

Does reinterpret(String, ::Vector{UInt8} make sense at all, or is this a pun on reinterpret?

JeffBezanson commented 7 years ago

That does seem to make sense.

TotalVerb commented 7 years ago

An issue is that this function is sometimes copying, so that name is somewhat misleading.

JeffBezanson commented 7 years ago

True, but strings are supposed to be immutable, so we can probably get away with that.

There is also a String(::IOBuffer) method, but it looks like that could be deprecated to readstring.

StefanKarpinski commented 7 years ago

I've thought about your proposed API change as well, but the interface of string(a, b...) is that it stringifies and concatenates its arguments, and this would make an annoying gotcha exception for callable first arguments. If we remove concatenation from string then it could be made to work.

TotalVerb commented 7 years ago

Yes, agreed; consistency and avoiding gotchas is most important.

JeffBezanson commented 7 years ago

Noting issues #18326 and #3893 in the "dimension arguments" category.

JaredCrean2 commented 7 years ago

If I can tack on another item: making sure the behavior of containers of mutables is both documented and consistent.

StefanKarpinski commented 7 years ago

@JaredCrean2: can you elaborate on what you mean by that?

JeffBezanson commented 7 years ago

I certainly hope it doesn't involve making lots of "defensive copies".

JaredCrean2 commented 7 years ago

For example, if I have an array of mutable types and I call sort on it, does the returned array point to the same objects as the input array, or does it copy the objects and make the returned array point to them?

stevengj commented 7 years ago

The same objects. I'm pretty sure all our collection sorting, getindex, filtering, searching, etc. methods follow this rule, no?

StefanKarpinski commented 7 years ago

I don't think there's any lack of clarity or consistency on that point – it's always the same objects.

StefanKarpinski commented 7 years ago

In fact, I think the only standard function where that's not the case is deepcopy where the whole point is that you get all new objects.

JaredCrean2 commented 7 years ago

Is that documented somewhere?

StefanKarpinski commented 7 years ago

No – we could but I'm not sure where it would be best to document it. Why would functions make copies unnecessarily? Where did you get the impression that they might?

o314 commented 7 years ago

Hello. I have not seen i believe any remarks about data serialization.

Soon or later julia programs will be written and run publicly, data will start to stratify sometimes, for years. Data serialization eg. the chain : object to bytes driven by type (maybe over json or ...) has to be built to be time resilient. Thinking about semantic versioning and web api may count too.

Could we expect the serialization for user data to stay close to https://github.com/JuliaLang/julia/blob/v0.5.1/base/serialize.jl ?

JaredCrean2 commented 7 years ago

Why would functions make copies unnecessarily? Where did you get the impression that they might?

I don't know whether they do or not. As far as I can tell, the behavior is undefined. From @JeffBezanson 's comment, there are people who advocate making defensive copies, which he opposes. So the documentation should address the question of defensive copies somewhere.

You seem to be implying some kind of least-action principle, but depending on the details of the algorithm, what is the "least-action" gets ambiguous. In order to get consistency across the API, I think more specific guidance is required.

StefanKarpinski commented 7 years ago

@o314: this is an API consistency review issue, I'm not sure how serialization relates.

StefanKarpinski commented 7 years ago

@JaredCrean2: whether the top-level object is copied or not does certainly need to be documented. What I'm saying is that deeper objects are never copied, except by deepcopy (obviously).

andreasnoack commented 7 years ago

What I'm saying is that deeper objects are never copied, except by deepcopy (obviously).

There was a recent discussion about this in the context of copy for some of the array wrappers, e.g. SubArray and SparseMatrixCSC but also Symmetric, LowerTriangular. It seems to me that under the above mentioned policy, copy would be a noop for such wrapper types. Is the policy you mention the right level of abstraction here? E.g. I think it implies that if Arrays were implemented in Julia (wrapping a buffer), the behavior of copy on Arrays should then change to a noop.

JuliaLang / julia

API consistency review #20402