Open gdalle opened 1 year ago
Possible structure of the post:
What would you add?
LiveServer.jl, in particular the live-building of documentation, is a game changer for writing documentation. The instant reward of seeing the rendered page is making it so much more fun.
Something I've found really useful since I first found it was project environments and global environments, and how they stack. So the global environment can contain tools that are commonly useful but you might not always want to add to a project env, and can be seen a bit like an extended stdlib with what you personally think should always be available :)
I think this is a great feature and use it often both as a user and package developer.
What about PackageCompiler? It should probably be mentioned for users (eliminate ttfx) and for developers (deploying apps)
Thanks for opening this discussion, Guillaume, I think it's an important collection of information that we can point newer users to! I really believe that centralising information like this is one of the best things that we can do as a community. One of the biggest problem we want to avoid is the feeling that there are some Julia wizards that know these magical incantations to make the language 100x better but the knowledge is stored in their head or their cryptic documentation.
I do believe that the post shouldn't be split up into three levels. We want to present a curated list of workflow solutions to common problems or questions, as in my video I think it's better to present all of the solutions together and qualify them with how powerful or useful they are to whom and when to use them. Also, Julia really blurs the line between scripter, package developer, and optimiser such that I don't think we should hint that they are separate things.
I do like how each package is part of a subheading talking about a specific topic, but I think some of the most important (commonly used according to my survey) packages are under headings that most people wouldn't think to read unless they were more experienced with Julia. For example, not knowing about JET and Cthulhu would be a real shame! Perhaps we should present certain tools as helping you write idiomatic Julia code as everyone wants to do that.
I think there's also a danger of presenting Julia as some sort of monolithic beast of packages that are required to be used to have a nice developer experience. This may be the impression that some newer folks take from such a post.
LiveServer.jl, in particular the live-building of documentation, is a game changer for writing documentation.
Thanks, I just added it!
Something I've found really useful since I first found it was project environments and global environments, and how they stack.
Good point, also in the list now!
What about PackageCompiler? It should probably be mentioned for users (eliminate ttfx) and for developers (deploying apps)
Indeed, but I tried to structure the list by difficulty, and in my mind it is a rather advanced tool. Probably cause I don't use it myself :shrug:
I don't use it either, but it always felt like something that might actually be useful to me - 90% of my time on Julia is spent data wrangling and doing statistical modelling, simulation or optimization using the same 10 packages, so theoretically I think having a custom sysimage could be great but I could never be bothered to try it.
As far as I know it's one line of code to create a sysimage, maybe I should try it and see how practical it is before suggesting it 😂
One of the biggest problem we want to avoid is the feeling that there are some Julia wizards that know these magical incantations to make the language 100x better but the knowledge is stored in their head or their cryptic documentation.
You read my mind.
I do believe that the post shouldn't be split up into three levels. [...] Julia really blurs the line between scripter, package developer, and optimiser such that I don't think we should hint that they are separate things.
That is a valid remark, and this user progression is something I definitely want to encourage. However, even with access to a blog post like this, it has to happen gradually. Beginners will need to master Revise and Pkg long before they even look at Cthulhu or PrecompileTools. And it seems a bit daunting to give them everything at once, especially with a very flat hierarchical structure (maybe we could find other natural headings?).
I think some of the most important (commonly used according to my survey) packages are under headings that most people wouldn't think to read unless they were more experienced with Julia. For example, not knowing about JET and Cthulhu would be a real shame!
I would actually love to know the results of your survey in terms of package use statistics, maybe you could share them here? My putting JET and Cthulhu at the end is a subjective choice, mainly due to the order in which I discovered things myself. Both have greatly improved in usability recently (Cthulhu mapping directly to source code is a game changer), so there is a case to be made for mentioning them earlier. However, my guess is that Julia beginners take things in the following order:
I'm open to being proven wrong, but if I'm not, then JET and Cthulhu belong in part 3
I think there's also a danger of presenting Julia as some sort of monolithic beast of packages that are required to be used to have a nice developer experience. This may be the impression that some newer folks take from such a post.
Agreed, when I wrote it down this morning I thought "boy that list is scary long". That's part of why I think we need a sense of progression and increasing difficulty. I am also considering a series of 3 blog posts for that very reason.
As far as I know it's one line of code to create a sysimage, maybe I should try it and see how practical it is before suggesting it 😂
It's not hard persay, in my video I show how to use it. But it is very fickle: some things won't compile no matter how hard you try, and knowing about incremental sysimages is a game changer. There's also vscode's built in functionality for generating sysimages, but I don't like it so much as it both didn't work for me when I tried to use it and doesn't allow for much control nor understanding of how the process works (useful for debugging it).
I would actually love to know the results of your survey in terms of package use statistics, maybe you could share them here?
Here you go: https://discourse.julialang.org/t/survey-on-how-you-use-julia/99807/6
I did a small writeup. The part that surprised me was just how used LocalRegistry
was, more so than JET
, Debugger
, and Cthulhu
.
my guess is that Julia beginners take things in the following order
I think that's a reasonable way of putting it, I also like the framing of "Write, Share, Improve". It might be useful if we made it clear that the first section/post is to get you up and writing/running whatever code you want in a structured way, and the third is about making the code itself better.
The main reason I want JET (and to a lesser extent Cthulhu as I see it as more advanced) to be made more prominent is because many "senior" (for lack of a better word) members of the community have expressed a desire to see it adopted more or that it may one day be a more integrated part of the language.
By making this blog post we have a lot of influence over how people learn Julia and think about writing Julia code. I want the future of Julia to be statically checked by JET! I want compile times to go down because instabilities and piracy are caught!
By making this blog post we have a lot of influence over how people learn Julia and think about writing Julia code. I want the future of Julia to be statically checked by JET! I want compile times to go down because instabilities and piracy are caught!
Maybe we could frame them as debugging tools instead of performance optimization. That way we mention them earlier and trick beginners into believing JET and Cthulhu are already the standard for tracking problems in your code. Then type-stable Julia becomes a self-realizing prophecy
Well, they kinda are debugging tools. Python has PEP8 checkers as a standard part of the workflow as well as things like isort. To me they come even before debugging tools, they should always be there in the background reminding you of small changes to your code to improve it.
I wish JET could be used as a passive static analyser/linter as opposed to having to be called actively to find errors. That's the part that holds it back imo.
That makes sense, I use JET and Cthulhu much more than the standard debugger anyway (yes I'm a println("here")
kinda guy)
Oh wait it can and is used in that way as a linter.
True but you still need to run your functions with a macro
I am also considering a series of 3 blog posts for that very reason.
Definitely agree that this should be broken up. If I were still a new user, I would be afraid of ever getting into package development if everything was presented all at once like this.
For "Calling other languages", would be good to also just link to the interop org https://github.com/JuliaInterop. I used to use RCall.jl a lot when I first started, so making all the other packages beyond PythonCall.jl discoverable would be ideal (I imagine that was your intent eventually anyway, but just wanted to highlight the org link).
For "Calling other languages", would be good to also just link to the interop org https://github.com/JuliaInterop. I used to use RCall.jl a lot when I first started, so making all the other packages beyond PythonCall.jl discoverable would be ideal (I imagine that was your intent eventually anyway, but just wanted to highlight the org link).
Of course! I mentioned PythonCall.jl specifically because many beginner want to use PyCall.jl instead, which is basically made obsolete by PythonCall.jl
I think to answer the question of hierarchy, we need to decide what kind of documentation this is. It seems to be straddling all four quadrants at the moment.
I think a horizontal "menu" of awesome packages would be excellent informational documentation. It could be supplemented with Tutorials for Revise, Debugging, and Notebooks, as well as How To Guides for intermediate users for stuff like PrecompileTools and PackageCompiler.
I think mixing purposes will muddy the usefulness for any individual reader.
Breaking things out like this is more work than the single blog post in the OP, but it would let us approach it incrementally.
I think to answer the question of hierarchy, we need to decide what kind of documentation this is. It seems to be straddling all four quadrants at the moment.
Good point. Let's go through the divio categories:
I believe that the video(s) I'm planning on making fit the explanation/tutorial side of that chart quite well. If nothing else, this does mean that content of this sort will be available for people to find.
I think our post should be full of links to learn more, so it can act as a jumping off point instead of a reference or explanation, but I'm not sure that a how-to guide is so appropriate. In my head this post is definitely learning-oriented at least somewhat, and so I think perhaps a hybrid how-to/tutorial is appropriate. I don't think we need to be 100% detailed and provide a concrete list of steps for everything we talk about, but some details to help people get started such as an example config for startup.jl
.
For debugging, @show
is my most used "tool" I think (one step up from println ;) ). A couple of these convenience macros deserve highlighting I think, especially if people come from languages without macros. I have once or twice used Infiltrator.jl as well, because the debuggers didn't work well enough and I just wanted to inspect local variables here and there.
@show
and some pretty println
string interpolation goes a long way.
In one of Chris R's videos he covered "catching the value that caused an error" using some neat global/ref trick that would also be good to cover
In one of Chris R's videos he covered "catching the value that caused an error" using some neat global/ref trick that would also be good to cover
const _args = Ref{Any}()
function foo(arg1, args...)
_args[] = deepcopy((arg1, args...))
# implementation
end
The deepcopy
is needed only for foo!
(one which modifies input arguments).
(That's actually what Rebugger does automatically to each item in a stacktrace, but it's not maintained currently.)
@gdalle I think you are saying that you want to separate the beginner-friendly things from the more advanced stuff. I think this is a great idea. Maybe a beginner just wants a minimal recipe or two for being comfortable when trying to code a project in Julia. So Pkg
and Revise
(or the VSCode equivalent) are essential. Cthulu (I refuse to look up how to spell that correctly!) is not essential for beginners. It doesn't need to be reserved for advanced users, but at least one level beyond beginner. It's easy to be overwhelmed.
Also, I use JET in testing and CI. In other words, I would group it with Aqua. The interface is a bit rough, but it can catch regressions in design. You want to catch this before you have erected too much on top of the problematic code. On the other hand, I do this by copying other scripts and handrolling scripts to filter out what I want to consider a false positive. JET is not ready for use by beginners in this capacity. (My intention is to use it always and include a JET badge (borrowed from S Krastanov) in my repos in order to promote normalizing it's use.)
You analysis of the divio categories is good. What brings the most benefit for the least effort.
One feature that I'd find worthy is a list with officially supported packages (if there's a thing like that). Given that Base only includes a set of minimal functions, the user needs to rely on packages. It'd be nice to know what packages are developed by the creators of Julia or are core packages actively developed by the community.
Examples of the packages I'm referring to are Statistics.jl, DataFrames.jl, StaticArrays.jl, etc.
One feature that I'd find worthy is a list with officially supported packages
In a way, the blog post we're currently discussing would provide such a list, but strictly restricted to developer workflows.
There has been a lot of discussion on Discourse recently about an alternative to the general registry with a more curated package list, for instance enforcing certain quality guarantees. I personally welcome these initiatives, but I'm not sure adding a package list to an already extensive blog series centered around methodology makes much sense. Thoughts from others on this?
As Guillaume said, this list doesn't exist which is part of the reason we are looking to make this blog.
I want to make package developers and users alike more aware that Aqua is an important tool for this. While it doesn't guarantee that a package has the perfect feature set, it does provide an effortful signal of quality.
One thing that it doesn't address is ongoing support, but I don't know how this can be guaranteed. The nature of most Julia projects is small in number of people working on it (normally one), and timeframe that this person is invested into it.
Perhaps we could add some pointers as to where high-quality packages can be found and how to assess them:
Rather than high-quality packages, I was referring to officially supported packages. By this, I mean hat the developers don't want to add these functionalities to Base to keep their development separate, but they should be considered "almost" as part of Julia. The best example is perhaps Statistics, but it includes others like Distributions. They're "optional" packages, but at the same time I can imagine that if Distributions is not maintained anymore by the original developers, there'd be official support to keep it alive and maintained. It's the type of packages that so many packages use as a dependency, that it'd break the ecosystem.
Sometimes I'm not sure if I should use some packages, because I don't know if they're this type of core packages. Examples are InlineStrings and PooledArrays, which I assume they're officially supported because packages like DataFrames are using.
In summary, I'd emphasize on "core packages" and "useful high-quality packages" when the list is described.
Rather than high-quality packages, I was referring to officially supported packages. By this, I mean hat the developers don't want to add these functionalities to Base to keep their development separate, but they should be considered "almost" as part of Julia.
I'm honestly not sure that these packages exist, for the same reason that there is no "official" organization behind Julia. As stated in this blog post:
The Julia project [...] consists of some code and a community of people who work on that code. The most clear cut line that can be drawn is that there is a set of people who have commit access to the JuliaLang GitHub organization [...] This set of people doesn’t really define the project, however, since there are many people who are prolific contributors to the Julia ecosystem but who do not have “commit bit.” The communal nature of open source makes it difficult to precisely define where the Julia project ends and the greater community begins, which is exactly how we like it.
I can imagine that if Distributions is not maintained anymore by the original developers, there'd be official support to keep it alive and maintained.
For the same reason as above, this seems inaccurate to me. If Distributions.jl or DataFrames.jl were no longer maintained, there would be community initiatives to take over, or maybe these package would be deprecated and replacements would emerge. "Core packages" like those you mention are great, and used by many, but I don't think anyone would claim they are "official", or "nearly part of Julia". On the other hand, such claims are much more justified for tools like Revise.jl or Pkg.jl, which are precisely the focus of my blog post proposal.
Update: the blog posts are being drafted on a separate repo, and we'll make a PR to the official website once they're ready.
Preview: https://gdalle.github.io/ModernJuliaWorkflows/ Repo: https://github.com/gdalle/ModernJuliaWorkflows Progress: https://github.com/gdalle/ModernJuliaWorkflows/issues
I really appreciate the initiative here. I think that such a blog post would be a great resource that I have long searched for when showing Julia to new students.
To chip in my two cents, to me, the single most useful debugging tool is Infiltrator. However, the workflow is a bit subtle:
]add Infiltrator
in your global environment]activate path/to/your/env
)
a. Load Infiltrator via using Infiltrator
b. At the place in your code where you need to debug, write Main.Infiltrator.@infiltrate
to set a "breakpoint". You can also make this breakpoint conditional, e.g. Main.Infiltrator.@infiltrate any(isnan, my_variable)
.
c. When you hit the break point, your REPL will "stop" in the corresponding local scope and you can interact with all variables visible in that scope for debugging. For longer debugging sessions (or when you try to capture offending inputs for a reproducer) you can use @exfiltrate my_variable
to make it accessible as Infiltrator.store.my_variable
from the REPL. @exifltrate
is essentially a streamlined version of Tim's trick above.
d. Hit CTRL+D
to exit infiltrator mode (e.g. continue continue after the breakpoint)
e. In order to reset the breakpoint states (e.g. if you used @skip
to skip a break point and you want to "unskip" it) call Infiltrator.end_session!()
Note that this workflow allows you to use Infiltrator without installing it to your local project; you only ever install it to your global "devtool" environment
using Infiltrator
in the local project works since we exploit the fact that environments are stacked (so we can load it in the REPL without it being added to the local Project.toml
)Main.Infiltrator.@infiltrate
accesses the breakpoint macro through the Main
module---the outer module that is implicitly spawned when you start the REPL.For these instructions to make sense to a new user, the blog post would have to lead with some details on global vs local environments (and the fact that they are stacked).
One thing that is very confusing for new users (and in general annoying) is that Infiltrator and other tools that work with standard input don't work right when started in VSCode via block execution. Then the input/output gets messed up and all jumbled. It only works if you execute the command directly in the REPL. It's not really recoverable either.
To chip in my two cents, to me, the single most useful debugging tool is Infiltrator. However, the workflow is a bit subtle:
Thanks @lassepe! I only recently discovered Infiltrator.jl, and I have added it to the list on the draft website.
For these instructions to make sense to a new user, the blog post would have to lead with some details on global vs local environments (and the fact that they are stacked).
That is definitely on the roadmap too, in the first few sections
Infiltrator and other tools that work with standard input don't work right when started in VSCode via block execution. Then the input/output gets messed up and all jumbled.
@jkrumbiegel what do you mean by jumbled? Do you have examples of other tools that fail in this way? Cthulhu is the only one I know
I can attest to what @jkrumbiegel is saying about VSCode mangling the commands if you send it to the terminal via run block
. I have to copy/paste Cthulhu.@descend
as well.
Interesting, I just tried again and now it throws an error that you shouldn't run it on async code. But if you disable that functionality according to the error message you can still see how it fails (I'm trying to print x
but it doesn't work most of the time and just swallows that input)
using Infiltrator
function f()
x = 1
@infiltrate
return x
end
f()
https://github.com/JuliaLang/www.julialang.org/assets/22495855/3a48dcc9-1521-4855-9035-de2ca291a137
I've written down some related (admittedly rough) notes for my Julia for ML course and I'd be happy to contribute. :)
That would be amazing @adrhill! I suggest we coordinate over on the blog repo
There's calling other languages and experiments -- what about presentation packages like these?
I think this might be a little too task-specific for our purposes
Could be also worth briefly mentioning other places to find more help, e.g. Discourse, Zulip, GitHub issues, Slack, and also knowing what type of help is best suited for those websites. For example, knowing how to write a good issue or provide a good MWE can go a long way even in developing packages in my experience, especially for debugging.
Could be also worth briefly mentioning other places to find more help, e.g. Discourse, Zulip, GitHub issues, Slack, and also knowing what type of help is best suited for those websites. For example, knowing how to write a good issue or provide a good MWE can go a long way even in developing packages in my experience, especially for debugging.
EDIT: the blog post is being created!
Is your feature request related to a problem? Please describe.
AFAIK, the typical tools and packages a Julia user needs daily are not documented in a single place.
Describe the solution you'd like
A lengthy blog post detailing the typical workflow for using, developing and testing packages. See structure proposal below.
Describe alternatives you've considered
Additional context
Related issues:
Possible contributors: