JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.44k stars 5.46k forks source link

Document privacy of type-fields #12064

Closed mauro3 closed 3 years ago

mauro3 commented 9 years ago

It is not documented that fields of types are considered private (at least by some people), as @nalimilan mentioned over in julia-users. It's probably worth giving some guidelines in the documentation when fields are considered private. As that is probably not always true, for instance when using a type just to store stuff.

Also, using an leading underscore is sometimes used to denote privacy of methods or (extra?) privacy of fields. When should _ be used vs just assume privacy of fields? When should methods use a _?

It is probably worthwhile to reach some sort of consensus about this and documenting it before going forward with field overloading (#1974 and PR #5848).

ScottPJones commented 9 years ago

Tagging all fields you want to keep private with a _ is a lot more of a bother than simply adding public/private keywords to the definitions where you can actually get a warning or error if misused. If the default is that fields of types should be considered private (something I've yet to see in the Julia codebase or packages), then maybe using a "public" keyword would be better. Simply documenting something that is not followed in practice is not really that helpful.

mauro3 commented 9 years ago

@ScottPJones: I think that is a separate issue: language feature vs documentation/convention. (Not sure whether one has been opened yet).

KristofferC commented 9 years ago

Field overloading would give the same flexibility to the dot notation as the @property annotation in Python, right? In that case we could just adopt the same convention as Python, fields are considered public by default unless they start with an underscore. Direct field access is no longer problematic since the implementation can now change in a backwards compatible way by overloading the field access.

ScottPJones commented 9 years ago

Yes, that should be opened as well. I still stand by my last point, that documenting something that people simply don't follow, or feel is incorrect (see @tknopp's comments about immutables in the same julia-users discussion) isn't going to do much good. The horse has already left the barn.

rofinn commented 9 years ago

I'm in favor of a _ convention as well. It would also be easy enough to add it to Lint.jl. We might also want to use the same convention for functions that are private to a file. @ScottPJones: The public/private syntax always seemed a bit verbose to me, but I could be convinced otherwise.

ScottPJones commented 9 years ago

If the default for all fields of a type is private, then you are simply adding 7 characters "public " to those fields. On the other hand, if you use a _ convention (which doesn't really buy you any assurance at all that people won't use your internals!), you'll be adding a _ for most every field, and most every field reference. I think _ is a lot more verbose in practice.

Also, for 29 years I worked on a language with probably >100,000 programmers using it, and before we added public/private (default private for functions in a module) (which was about 18 years ago, IIRC), we constantly had problems with programmers using internal interfaces, and then complaining that we broke things they depended on. I'd like the programmer community of julians get to be as large or larger, and I'd hope that julia not repeat mistakes I've had to painfully deal with in the past.

carnaval commented 9 years ago

Why do you only consider the PoV of the writer of a library (/piece of code) ?

I've been in the opposite situation many times when I'm using someone's code and they made some things private because, after all, it's "good practice" to hide most of your internals. However, I know what I'm doing, I read the source, the compiler could do it but refuses because of some keyword ?

In other words, it's not only mindless idiots using your pristine code, sometimes you are also using some mindless idiot's code ;-)

ihnorton commented 9 years ago

As currently envisioned, I think getfield/setfield overloading would facilitate a privacy convention for people who want it, without any other changes to the language. To @carnaval's point, fields would still technically be accessible using Core.getfield (or whatever it ends up being called), but anyone who does that will own their breakage.

ScottPJones commented 9 years ago

No, I'm actually considering both. As a user of somebody else's code, I'd like to know just what the real API is, what the contract I have with the module / library / package. That way I don't waste any of my time when the owner of that package decides to rewrite everything.

If you see that the code you are using does not give you some needed functionality in its API, then you can, depending on if it is open source or not, then you'd file an enhancement request (that's what I would get from customers), for open source, you'd raise an issue (if you don't know how to fix it yourself), or submit a PR (which a smart guy like you would probably do!). For open source, you'd even have the option of forking the darn thing if the author(s) are not responsive (or died, or whatever), and people could start using your new improved version that does have the functionality you want.

Mucking around in the internals only helps you, makes your code more fragile, and doesn't help anybody else. Doing the above helps the entire community.

I've been on both sides of the fence throughout my career, and I've had to deal with my own share of mindless idiots! (luckily, I haven't seen run into any yet in the julia community [we may disagree, yes, but I do know they are brilliant])

Maybe this could be handled like deprecations. privatewarn == 0 means no warnings, privatewarn == 1 means a single warning, privatewarn == 2 means give an error if some mindless idiot is mucking about in my beautiful code! :grinning: Would that make this not such a bother to you?

nolta commented 9 years ago

The road to C++/Java is paved with good intentions. -1e6 to any sort of public/private wankery.

ScottPJones commented 9 years ago

@ihnorton The problem with that is, it is still just a convention, and is not easy to find out if people are breaking that convention. Also, having to use .. every time I just want to access the fields in my own types directly would be incredibly annoying, IMO. @nolta The Julia code base and packages already have problems, because it has no mechanism to keep the abstraction and the implementation separate. This has nothing to do with C++/Java. This is more about avoiding the "object orgy" that happens in a lot of dynamic languages. See https://en.wikipedia.org/wiki/Object_orgy I also see this as allowing people to get a bit closer to the niceness that CLU had, if they want to.

ihnorton commented 9 years ago

getfield(::MyType, Any) = error("No!") would be something more than a convention. The existence of Core.getfield is an escape-hatch. If people use that, it's not my/your/our problem.

Also, having to use .. every time I just want to access the fields in my own types directly would be incredibly annoying, IMO.

so use setters and getters...

lobingera commented 9 years ago

I tried to follow this already at the mailing list and questioned myself: where did i follow this or the other convention in the last ~30 years of programming? I entered object orientation late and always found the private/public differentiation as something obscure. I understand where it comes from and why it's really, really needed, but in writing code and especially in rapid prototyping it's defining a speed limit.

Two things come here to my mind: 1) A real programmer can write fortran programs in any language -> If you try to stop people to express their ideas with certain language constructs, they'll certainly find ways around. 2) In SW Engineering all problems are communication problems -> If you cannot transport the message, don't use this, well...

tl;dr: a convention for the name that can be checked by a lint should be enough.

ScottPJones commented 9 years ago

@ihnorton Why would I want the extra complication of adding setters and getters, just to access my own internal structures? That seems like a waste of my time.

@lobingera How do you run lint on the programs that hundreds of thousands of people have written, that you have no access to? What happens if you make what you think is a minor internal change, and you break software running all over the world? (which can have severe economic effects as well, if your company is selling software).

lobingera commented 9 years ago

@ScottPJones

Well, i somehow believe in the superiority of Open Source. Therefore the situation that i have no access to the 'other' code doesn't happen. But actually i meant, having a strong warning and structural checking system on my side, when i contribute code should be enough.

rofinn commented 9 years ago

@ScottPJones I'm still in favor of the convention and linting approach cause it seems like the path of least resistance. However, I suppose we could have a pub keyword that autogenerates the setters and getters for you. I don't think adding public/private is really going to solve your problem if you aren't using tests cases, linters, etc that should be finding most of these issues. Anyone who has used C++ knows that it can be extremely easy to break your software with a minor internal change ;) (ie: a memory leak).

malmaud commented 9 years ago

-1 to language-level enforcement of private/public. This is a language primarily for rapid implementation of scientific algorithms; not Java.

On Wed, Jul 8, 2015 at 11:33 AM, Andreas Lobinger notifications@github.com wrote:

@ScottPJones https://github.com/ScottPJones

Well, i somehow believe in the superiority of Open Source. Therefore the situation that i have no access to the 'other' code doesn't happen. But actually i meant, having a strong warning and structural checking system on my side, when i contribute code should be enough.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/12064#issuecomment-119627505.

Keno commented 9 years ago

Yeah, a huge -1 to mandatory access enforcement from me too. In C++, they only get in the way if you're trying out something new before you know what the right interface is. And if you do know what the right interface is, having to access private fields is a code smell that could be caught by Lint.

ScottPJones commented 9 years ago

@Rory-Finnegan the convention and linting approach does nothing to help if you don't have access to the code using the modules/packages that you write. Before we added the ability to optionally use public/private tags for functions, properties, and methods, we used to waste a very large amount of time due to customers having abused some code that was supposed to be internal only (to say nothing of the problem of said customers being upset that their wonderful code stopped working).

@malmaud Why should julia be limited to scientific computing? That seems very narrow. Julia seems to me to be able to replace C, C++, Java, and Python for most of my programming (and I'm not doing scientific computing).

@keno What is the problem with an optional feature, that the author of a module can use or not as they see fit? Also, if it is controlled via a switch like depwarn, it can be turned complete off for "rapid application development" (that could even be the default setting). There is nothing "mandatory" about what I've been proposing for Julia.

ScottPJones commented 9 years ago

I understand where it comes from and why it's really, really needed, but in writing code and especially in rapid prototyping it's defining a speed limit.

@lobingera So, you do understand that encapsulation is really, really needed. I've never found that the ability to mark things as internal (private) or externally visible as ever slowing down my writing any code, nor did it slow down any of the customers (they were very happy about it, after it was introduced) (and the language I worked on was for precisely "rapid application development"). The customers were all writing code that had to be maintained reliably for decades, in mission critical applications (hospitals, banks, on-line trading, etc.).

To me, always having to run Lint while I'm developing something, is a much bigger speed limit than simply having the compiler warn me immediately.

mikewl commented 9 years ago

I think, this could be done somewhat like this:

  1. Non-annontated fields are a grey area and are accessible always (Lint can have a switch to add warnings for non-annotated fields as that warning should not be default).
  2. A keyword private or some other keyword is created that marks it as a private field. This field can be accessed then by implementing either getfield, setfield! or both depending on your needs. This would then allow for the field to be validated as it is being set externally for example or creating a read-only field.

    So this would be something extra that you only would add if you felt it was necessary for your work. I personally would never use it myself. However, if I wanted to use Julia in a field such as finance... I would be far more careful about encapsulation. This would simply be a way of enforcing encapsulation when necessary. I think it would be a useful feature but one that would not and should not be used very often. This way would also not break anything and not change anyone's workflow.

However, this isn't urgently needed. Right now I don't think very long running or critical systems are being written in Julia. I also feel that Julia is useful outside of scientific computing. That currently though is not its focus. This probably should be revisited at a later date once field overloading has been implemented as without it this can't be implemented as nicely.

vtjnash commented 9 years ago

Yeah, a huge -1 to mandatory access enforcement from me too. In C++, they only get in the way if you're trying out something new before you know what the right interface is. And if you do know what the right interface is, having to access private fields is a code smell that could be caught by Lint.

it certainly doesn't stop LLVM from iterating their API continuously. additionally, in many cases, they enforce the public/private distinction via public / private headers – i'm not sure how well that transfers to a language based on dynamic reflection instead of headers, however.

hayd commented 9 years ago

Could this be done in a package? E.g. with a @private macro? Taking a keyword seems a mite premature.

As mentioned `` prefix is used in python to denote "private", without enforcement. Python's done without._

ScottPJones commented 9 years ago

@vtjnash No, this isn't a cure-all, but at least in my experience, having the option of enforcing at least some level of encapsulation is critical to building reliable systems. Since I also stated that the default could be to simply not even warn or give error messages (a la depwarn), it wouldn't effect absolutely anybody who didn't need this functionality. Just because the LLVM developers seem to like to change things every release (and there seems to be an endless stream of bugs that Julia has to deal with) doesn't mean that that is a good thing! How LLVM deals with public/private distinctions I think is totally orthogonal to how Julia might deal with it.

@hayd if it can be done with say @public and @private macros, that would be just fine, I really don't care so much about the syntax, but rather the functionality I need to be able to deliver reliable systems that will still be working in 30 years time. I still don't have my head wrapped around how to deal with Julia meta-programming yet, so I have no idea what is possible.

@Mike43110 From conversations I had at JuliaCon, I think there are actually a number of people who would welcome anything that can be done to help writing maintainable, reliable, large applications in Julia. We (myself and the Belgian company I'm working for) are definitely interested in using Julia already for critical systems (which is why I take these issues so seriously).

ScottPJones commented 9 years ago

Although Python gets by fine without any enforced privacy, Python is also notorious for being poorly suited for writing large scale reliable systems - lack of privacy of internals is only one factor among many, but it contributes.

@hayd As @tkelman noted above (in the julia-users thread), Python's unenforced _ convention really is not enough.

tkelman commented 9 years ago

You also neglected to quote my other point. Prototype implementation or bust. Not worth spilling bits talking about it.

ScottPJones commented 9 years ago

I responded to that in the julia-users group. It's not spilling bits, I think it's been quite useful. I wouldn't have thought that it would be possible to do this as a macro, not yet knowing all their ins-and-outs yet, but if so, then that would be fine for a prototype (like traits now, I suppose).

mikewl commented 9 years ago

@ScottPJones @public and @private please! I am sure those people aren't interested in this.

I don't know enough about macros to know if it would be possible. It would make for a good prototype if possible though.

tknopp commented 9 years ago

@ScottPJones: Please stop quoting things out of the context this is very misleading. I have made pretty clear that for mutable types it is common practice in julia that fields are private and that for immutables this is not entirely clear.

+1 for documenting the common practice and working on better interface support

Julia is already an excellent language for writing maintainable large scale applications. The type system including subtyping of abstract types helps a lot in this.

ghost commented 9 years ago

For what little that my opinion matters, -1 to making information hiding a part of the core language. +1 for better documenting what we consider to be idiomatic Julia, perhaps even a manual page on "Writing Idiomatic Julia Code".

I would also like to propose "Access Equality" to replace "Consenting Adults", putting myself firmly in the pro-equality camp.

ScottPJones commented 9 years ago

@tknopp I never quoted you, I just said to look at your comments in the julia-users threads about immutables, how can that be "out of context", when I said to look at the context? You did fairly clearly state that there: Note that for immutable the fields are(!) the interface.

@ninjin If the ability to hide the implementation, in order to maintain a separation between abstraction and implementation, were simply optional, why would that be a problem for you? I haven't said that making anything private would be the default for julia, just that I think the capability should be available, for those of us who want encapsulation.

Do people see it as a problem that they can't go directly accessing the internals of julia's boxing/unboxing and type system? (at least, I hadn't seen that so far). To me, that's a good example of where having the implementation details protected is a good thing.

I do like your proposed "Access Equality" terminology - I just don't think it is as black and white as people seem to be thinking, i.e. of us vs. them, this camp or that, Team Hide-Things vs. Team Everything-Goes.

ghost commented 9 years ago

If the ability to hide the implementation, in order to maintain a separation between abstraction and implementation, were simply optional, why would that be a problem for you?

Because more than once have I, and most likely all of us in the equality camp, encountered a well-meaning library/package designer that hid just the portion that we needed to tinker with. Sure, we could fork or submit a patch, but at the end of the day we just want to get work done and are prepared to take the potential breakage. Ultimately, giving the option to hide information will then just result in adding a way to unhide said information, thus just making things more complex, this is why I am against even an optional way to do it. @nolta put it best, although maybe a bit bluntly, "The road to C++/Java is paved with good intentions.".

ScottPJones commented 9 years ago

I'd say, the road to constant breakage is paved with good intentions.

toivoh commented 9 years ago

+a lot to documenting the convention that fields are private by default (unless documented otherwise). I definitely agree that it's important to be able to define the public interface to your code clearly, so that you are free to change the internals.

But when it comes to a mechanism for public/private fields, has anyone thought about how it could even be implemented in Julia? In a traditional object oriented language you can restrict access to private fields to the methods of the same class. But in Julia, it's not so obvious to the compiler which code is part of the implementation of which types (though it should hopefully be to the programmer). I'm not sure there's a practical definition of when an access is inside the implementation, and thus is fine even for private fields.

tknopp commented 9 years ago

Could we first answer the fundamental question: Are fields considered to be part of the interface of a (mutable) type?

My vote is no. And the array/iteration interfaces are examples for this rule.

toivoh commented 9 years ago

I agree, for mutables and immutables. Though it has to be possible for the author of a type to make an exception.

KristofferC commented 9 years ago

With field access overloading I would vote yes (just like in Python). The reason for this is that the dot syntax is so effin convenient and with overloading, changes to the implementation can be made in backwards compatible ways. This is similar to properties in Python, for a short summary see: http://blaag.haard.se/What-s-the-point-of-properties-in-Python/.

"Private" fields can then by convention start with an underscore.

lobingera commented 9 years ago

@tknopp, @KristofferC. I'd vote Yes on field access i.e. putting no additional effort to restrict field access. But still a warning should be configurable that shows you are trying to access a field of a type within a module / outside the current scope or a field marked as _.

toivoh commented 9 years ago

I think we need to distinguish between the question of whether we should put any effort in restricting field access, and whether fields should be fundamentally considered part of the public interface. As I understand it, @tknopp was going for the latter.

KristofferC commented 9 years ago

If fields never are a part of the interface I am scared this leave us with getter/setter hell for cases where you have a type and you actually want to manipulate/access the fields of the type. It also means we basically "waste" the dot-syntax which is such a convenient syntax. Compare:

get_vertices(get_element(mesh, n), 2)

and

mesh.elements[n].vertices[2]

To note, I have mostly programmed in Python so that is the (limited) mindset I have when I write this.

nalimilan commented 9 years ago

@KristofferC Your counter-example is a strawman. That would more likely look like

vertices(elements(mesh, n), 2)

or

vertices(elements(mesh)[n])[2])

The latter being almost exactly identical to the dot syntax (with two more characters).

tknopp commented 9 years ago

Thanks @toivoh. That is exactly what I mean. These are independent things.

@KristofferC: This is not common Julia practice. Its under discussion in #1974. I think it is very important to drive this discussion from what is used today as a common practice. And until there is a majority of core maintainers that want the array length to be accessible by x.length (as an example) the status quo is that one writes functions to access fields of a type.

KristofferC commented 9 years ago

@nalimilan The only thing you changed was renaming the getter? Naming the getter the same as the field has the problem of polluting the name space. You can no longer use the variable n and m in a function that uses sparse matrices because you then lose n(A) for the nr of cols. It could also be argued that the number of characters (and annoyance) of writing getters and setters for every "public" field should count against the function syntax. Anyway the point I tried to make had nothing to do with the number of characters.

I am a bit confused how you can say the function expression and the dot expression look the same. To me they look nothing alike. The problem I have with parsing the function expression is that the number n and 2 gets moved away from the object they are applied to. This means you always have to mentally parse the delimiters. With the dot syntax the item number and the collection are always next to each other and you just walk down the chain.

My personal opinion is that it is a bit sad to go the Java way here when this is a problem that (imho), for example, Python has already solved nicely with its "properties".

MikeInnes commented 9 years ago

Far more effort has been put into this discussion than it would have taken to just prototype it; since I hate to see bits needlessly wasted, I went ahead and did that.

PrivateRyan.jl

/thread (except for the documentation bit, that's actually useful)

KristofferC commented 9 years ago

@tknopp Having length as a function makes sense because it has a meaning by itself. For many types length is not a field and you might ha e to traverse a list or something.

On the other hand we have n. Without the type SparseMatrixCSC, n has no meaning. This is also reflected in the functions taking a sparse matrix where A.n and A.colptr etc is used heavily.

I therefore think it is sensible to separate functions which a lot of types are likely to overload and fields which are unique for the type.

lobingera commented 9 years ago

@ScottPJones (about lint and encapsulation)

Have you recently recognized that julia tries hard to hide the compiler from you? julia has the look and feel of a scripting language and encourages working at the REPL. I change and evaluate a long time until the code somehow converges into something usable and reusable and will then be copy+pasted into a .jl file or constructing a module. Along that i might run expressions through something like lint and also expect infrastucture in the editors i use to run code checks in the background. So much for the immediate output of the compiler.

For 'only' the marking of something being non-publicly accessible. You already (and if you actually have backround in this lexer/parser/AST/compiler business you should know) should have recognized that adding an optional keyword per field is changing language syntax - this is different than marking via namechange -> _fieldname - and has some drastic impact on the low-level implementation of the JIT compiler. It's doable but has some effort needed to be done correctly. Julia would allow to do things like this via macro - and that would be my recommended way if you really, really needed for your work. But the style of the discussion leads me to the conclusion that this should be somehow made mandatory.

And i fail to see the gain. Encapsulation replaces communication about the usage of the interface with enforcement of the interface. All the effort i'd rather spend on communicating the use of the interface e.g. docstrings, interface documentation etc.

But i'm quite aware, that all this is from my background, but i somehow agree with @nolta here...

yuyichao commented 9 years ago

@lobingera Note that you can click the time and paste the link to a single comment. The time (20h ago) will change unit/precision in a few hours and people won't be able to tell what you are replying to.

Edit: like this https://github.com/JuliaLang/julia/issues/12064#issuecomment-119933926

lobingera commented 9 years ago

I see.

mauro3 commented 9 years ago

@KristofferC I don't think there is much need to use the internals of sparse matrices. n and m you should access through size, for most all uses nzrange, rowvals and nonzeros should give you all you need.

@one-more-minute, you're a macro-ninja! Let's see how @ScottPJones likes it.

tknopp commented 9 years ago

@KristofferC: While I understand this issue, in practice I find rarely an example where a concept is really unique to a type. m,n of SparseMatrixCSC are of course absolutely private because of size(A,1) and size(A,2). As Mauro answers right now the same holds for the other internals.