Initial topic ideas - Githubissues

HugoGranstrom commented 3 years ago

Let's brainstorm ideas for the articles we would want to see here eventually. And then when we have a decent amount of ideas we can start to get a sense of how best to structure the content topic-wise.

Here's some on top of my head (and a tad bit leaning toward Numericalnim...):

Numerical integration (1D, both scalar and cumulative)
Interpolation (1D, 2D, 3D)
ODEs (IVP)
Plotting
Matrices/Tensors

If you have any topic you think would need a specific article (a specific kind of plotting like bar plots for example) go ahead and add it to your list as well.

Let the brainstorming begin!

HugoGranstrom commented 3 years ago

On a related note, should we make a short forum post to introduce people to this and hopefully get some more ideas?

pietroppeter commented 3 years ago

statistical learning/Machine Learning algorithms (linear regression, logistic regression, k-means clustering, decision trees, random forests, SVM, neural networks...) but also dimensionality reduction, feature engineering, model evaluation (basically the type of stuff that https://scikit-learn.org/stable/ provides for python)
data wrangling for dataframes: filtering, sorting, grouping, ... the kind of features that pandas/dplyr provide to Python/R respectively (see for a short example: https://gist.github.com/conormm/fd8b1980c28dd21cfaf6975c86c74d07)

pietroppeter commented 3 years ago

see also for other ideas the Meta issue "are we scientists yet?": https://github.com/nim-lang/needed-libraries/issues/77

HugoGranstrom commented 3 years ago

see also for other ideas the Meta issue "are we scientists yet?": nim-lang/needed-libraries#77

Oh right, had forgotten about that one. That's a gold mine for what libraries are out there 😄

I wrote up a draft for a post below. Any feedback is appreciated, I wasn't sure how much emphasis to put on that they could contribute if they wanted to without scaring them away from just giving ideas if they aren't interested in writing themselves.

Updated post:

SciNim is an initiative to unite all scientific Nim packages under a common umbrella, and making sure they play along nicely with each other. Examples of packages we currently work on are Flambeau (a wrapper for Pytorch's backend, Libtorch) and Unchained (A fully type safe, compile time only units library). We have started a project at getting-started where we plan on writing getting-started tutorials for a wide variety of topics so that new users interested in Nim for science has a one-stop-shop for getting up and running. We aim to provide tutorials for both the most basic things in scientific computing like linear regression and tensor operations to more advanced topics in the different areas. Right now we are in the brainstorming process and would like to hear what you think would be useful tutorial topics to write in the future. The idea is that this should be community-contributed and that if you have something you think others would have use for you will be able to submit a PR with your tutorial. We will also try and write about the topics we are familiar with and if we can get your suggestions it will be much easier to know what to write about and what people want to read about. :) Nimibook was recently released in a first version and Nimib is a perfect suite for writing tutorials and making sure they are up to date as the code examples are compiled when generating the HTML. The plan is to make use of them and write all the tutorials using them. An example can be seen on Nimibook's website . If you have any thought on topics you would like to write/read about, head over to github and write some lines about it. No idea is too small to be shared! :D -- The SciNim Team

Vindaar commented 3 years ago

data wrangling for dataframes: filtering, sorting, grouping, ... the kind of features that pandas/dplyr provide to Python/R respectively (see for a short example: https://gist.github.com/conormm/fd8b1980c28dd21cfaf6975c86c74d07)

I can write a conversion of that using the ggplotnim DataFrame. It is a dplyr inspired syntax after all and seems like a good general overview.

As other topics I would add:

curve fitting
more general non-linear optimization problems
physicsy computations aided by unit checking
and more I can't think of right now :)

edit: for the time being the direct conversion exists here:

https://gist.github.com/Vindaar/6908c038707c7d8293049edb3d204f84

I'll write a derived version as its own getting-started page.

pietroppeter commented 3 years ago

For the forum post, as much as I appreciate nimibook and nimib getting first mentions, probably we could start by first mentioning what Scinim is and what getting-started project is about. Incidentally they use nimibook and nimib which look like a good fit at the moment :).

HugoGranstrom commented 3 years ago

For the forum post, as much as I appreciate nimibook and nimib getting first mentions, probably we could start by first mentioning what Scinim is and what getting-started project is about. Incidentally they use nimibook and nimib which look like a good fit at the moment :).

That makes sense, have edited the post above :)

HugoGranstrom commented 3 years ago

@Clonkk noticed you weren't on the repo's "watch list", in case you had missed this :)

Clonkk commented 3 years ago

Sorry I just noticed this.

Let's brainstorm ideas for the articles we would want to see here eventually. And then when we have a decent amount of ideas we can start to get a sense of how best to structure the content topic-wise.

I think the best way is to start by writing pages in broad category (like data visualization, algebra etc.), and then we can re-arrange pages once we have enough content to make sub-category.

Don't overthink it, Nimibook is flexible enough :smile: .

I wrote up a draft for a post below. Any feedback is appreciated, I wasn't sure how much emphasis to put on that they could contribute if they wanted to without scaring them away from just giving ideas if they aren't interested in writing themselves.

Draft seems okay. One thing that's missing IMO, is how to contribute / join SciNim. Maybe it'd be helpful to have a non-admin Team so the org can grow with people who wants to create / maintain one or two library and don't need admin access to all repos ?

HugoGranstrom commented 3 years ago

I think the best way is to start by writing pages in broad category (like data visualization, algebra etc.), and then we can re-arrange pages once we have enough content to make sub-category.

Yes indeed, but it can be hard to grasp which broad categories to use as well as some of them overlap. So getting a "map" of sorts of topics we will likely write about helps nailing it down to most useful categories :)

Draft seems okay. One thing that's missing IMO, is how to contribute / join SciNim. Maybe it'd be helpful to have a non-admin Team so the org can grow with people who wants to create / maintain one or two library and don't need admin access to all repos ?

The reason I didn't write anything about that is that I don't know the answer 🙃 It is certainly something we should discuss though as some people likely will be interested in joining. So you idea is that people can join the non-admin team and then we manually give them permissions in the repos they work on? Or separate teams for each repo?

HugoGranstrom commented 3 years ago

Idea from discord: tutorials specifically aimed at users of libraries in other languages. For exampel "Datamancer for Pandas developer" and "Arraymancer for Numpy developers". Alternatively "Nim for Pandas/Numpy developer" if we don't want to tie it to specific Nim libraries. They should mention the likenesses and differences between the Nim and Python/R/etc libraries and it wouldn't hurt having a section where a simple/intermidiate program is ported to Nim with a line-by-line explaination.

HugoGranstrom commented 3 years ago

@Clonkk Would adding a paragraph with If you want to join or help SciNim reach out to us on the nim-science Discord/Matrix/IRC channel suffice for now? Then we could handle it on a case-by-case basis there.

I'll post it on the forum later today unless someone has any further remarks :)

Clonkk commented 3 years ago

Yeah that should be good.

xioren commented 3 years ago

Well I have had impulse downloaded on my computer for months but have yet to really learn it. (Subjectively) I think a tutorial on working with impulse, fft/dct and images would be useful.

Araq commented 3 years ago

Deep learning. In particular, how could you write something like this in SciNim: https://github.com/numenta/numenta-apps, the "sparse networks" ideas are very interesting. See also

https://numenta.com/neuroscience-research/research-publications/papers/sparsity-enables-100x-performance-acceleration-deep-learning-networks

Clonkk commented 3 years ago

LinearAlgebra on Tensor datatype
LinearAlgebra using CUDA
Simple ML tutorial. Something like this https://blog.paperspace.com/getting-started-with-scikit-learn/ - we can reuse their dataset - as an introduction (before moving on to more complicated things).

al6x commented 3 years ago

I think it would be useful to replicate some of the most popular python introductionary notebooks. With lots of visuals and simple math. Seems like Titanic Tutorial is quite good and popular, and exists in Python and R versions.

It's easier for people to learn when they already know some part of a new thing. So maybe some people from Python and R communities well be more inclined to try Nim for something they already knew.

HugoGranstrom commented 3 years ago

Inspired by the answers in this forum post we should have a tutorial on how to easily input unicode characters on the different OSes and editors.

bung87 commented 3 years ago

About Deep learning, Take some example from https://d2l.ai

al6x commented 3 years ago

After thinking, I'm taking back my suggestion about Titanic dataset. Maybe analysis of movies would be more interesting. Because the classical tutorials about Iris or Titanic, are boring, nobody know anything about it or cares.

But dataset about movies are interesting. Everyone watch movies. And there are tons of data - genres, actors, ratigns, popularity, reviews, maybe even texts of lyrics to showcase NLP. That kind of stuff is interesting.

pietroppeter commented 2 years ago

today I ran into this free book "Probability 4 data science" which has code snippets in Matlab, Python, Julia, R. It could be nice to try and reproduce what we can with Nim (I guess we would discover some gaps to be filled). Example of code from first chapter: https://probability4datascience.com/python01.html

al6x commented 2 years ago

I recently saw a very nicely done interactive course in Julia Introduction to Computational Thinking.

It's made with Jupyther-like notebook thing, with all the examples and code interactive and could be changed online. Looks really nice.

kerrycobb commented 2 years ago

I started writing a tutorial demonstrating how to infer parameters of a linear model using Bayesian inference. Would there be any interest in including it here when it's finished? If so, I would welcome any suggestions. You can see it here: https://kerrycobb.github.io/nim-bayes/

pietroppeter commented 2 years ago

I love it, I would say definitely yes :)

As for suggestions, the code seems complete and straight to the point, I would go in the direction of expanding explanations (references to learn more about Bayesian linear regression and MCMC, explain the existence of a distribution package - which is not in stdlib, explain that we do step by step, explain there is not a MCMC library but we can code it from scratch, what do the different plots tell us in terms of what we expected and what we see...).

On top of explanations, I love the choice of parameters for the simplest case possible, would it be worth exploring at least another case (to see how things change...)? In the future I hope it will be easy to do with nimib interactive stuff like your other JavaScript repo on exploring priors and posteriors.

Gotta say I am still so happy when I see a new document produced with nimib, cannot still shake the surprise and excitement of seeing unkown people actually using it 🤩.

On that topic since I could peek easily the code, a very minor detail I see is the unnecessary line on mathjax_support added to context (residual for experimenting with mathjax?).

Finally, as a general suggestion for this thread, we could definitely use a tutorial for doing simple linear regression (I should actually do that myself!).

pietroppeter commented 2 years ago

Just because I recently ran into it (through Gelman's blog) and is related to our recent discussion, here is a nice explanation of the advantages of Bayesian linear regression in the applied context of media mix modelling: https://getrecast.com/bayesian-methods-for-mmm/

This serves also as a reminder that the usual statistical way to present linear regression, calling it OLS and focusing on inference instead of prediction, comes with a bunch of associated statistical metrics and it is something basic that afaik is still missing in our ecosystem.

Vindaar commented 2 years ago

@kerrycobb Just skimmed over your tutorial again and saw the following:

TODO: Figure out why this isn't plotting correctly

var standardized = seqsToDf(stX, stY)
ggplot(standardized, aes("x", "y")) + geom_point() +
    ggsave("images/st-simulated-data.png")

The reason is simply that seqsToDf without explicit keys generates a DF with keys of the names of the variables. In the aes call within ggplot you then hand the strings "x" and "y", which should be "stX" and "stY".

And I'd love for this to be included!

SciNim / getting-started

Initial topic ideas #19