fslaborg / RProvider

Access R packages from F#
http://fslab.org/RProvider/
Other
235 stars 69 forks source link

Project Status #209

Closed siavash-babaei closed 3 years ago

siavash-babaei commented 3 years ago

Hi,

With many thanx to the authors and maintainers for such a brilliant feature. It seems this TypeProvider has not really worked since R 3.5.

Plenty Appreciated ..., Cheerio

hmansell commented 3 years ago

I wrote the original version but I am no longer actively developing for F#. @tpetricek might be still involved.

dsyme commented 3 years ago

We need an active maintainer. Any volunteers?

dsyme commented 3 years ago

(Ping me directly on my email if needed - I don't always see notifications here)

siavash-babaei commented 3 years ago

Dear @hmansell, since it is now part of FsLab, @tpetricek should be involved although I suppose FsLab itself needs revamping perhaps as a whole.

Incidently, a good comprehensive FsLab environment would certainly make F# more readily competitive with likes of Python and R. Even Julia has pulled ahead in data analytics in terms of capabilities in many senses which is a pity for F#, the language being very mathematical at core and brilliantly suitable for everything data.

  1. Maybe add a few other TypeProviders like SQLProvider, etc. support NoSQL big names as well.
  2. In terms of prototyping, .Net and F# are very verbose and use complex syntax when it comes to machine learning. This makes R interop and RProvider a most valuable tool.
  3. A simple regression model in R is specified by lm(Y ~ X1 + X2).
  4. In comparison, F# .Net code is still very clunky making prototyping cumbersome in this area.
  5. Compared with say, R ggplot, F# charting abilities are barely ok for exploratory analysis but far off quality in production code.
  6. Ports of Spark, Keras, Tensorflow, ML.NET in idiomatic F# would boost competitiveness 1000-fold. For the life of me, I don't get why everything available is so C#-ish with C# being so inherently very unsuitable for prototyping and data science pipelines.
  7. Other stuff like Deedle, FSharp.Data, Literate Coding, Jupyter Notebook
  8. Support, Math.NET, Accord.NET, etc. are just great, although I personally despise Accord syntax for prototyping.

I am not sure if or how much R and .Net APIs have changed but I doubt by much. As far as I have seen as a user, R Core has not changed much on the face of it for many years, and .Net 4.0 code can be consumed in .Net 5.0 with minimal changes. Hopefully, updating it shouldn't be a major rework ... just the thought is encouraging!!!

Our very dear BDFL @dsyme: I would offer my help except I don't have the experience of maintaining repos. In other areas perhaps once it takes off ...

zyzhu commented 3 years ago

@siavash-babaei, I'm glad to see your enthusiasm. Please take a look at an old issue in 2018 discussing about FsLab and data science using F# in general. https://github.com/fslaborg/FsLab/issues/137

Lots of progress have happened since then, especially in the Jupyter notebook through dotnet/interactive kernel. I think interop with Python is in the pipeline according to some talks from Microsoft. I hope interop with R will come some day too. But that might be too big an ask from Microsoft team.

zyzhu commented 3 years ago

About linear regression. A pull was added to Deedle early this year to support some form of lm in R https://github.com/fslaborg/Deedle/pull/496

Take a look at some testing samples https://github.com/fslaborg/Deedle/blob/master/tests/Deedle.Math.Tests/LinearRegression.fs

let actualCoeffs =
    LinearRegression.ols ["MSFT";"WMT"] "AES" true stockReturns
    |> LinearRegression.Fit.coefficients
siavash-babaei commented 3 years ago

Thanx @zyzhu. I doubt Microsoft would get involved in something like RProvider and I am not sure how simple interop with python would be helpful. I mean, for C/C++/Fortran, it makes sense to provide some simple interop so that you can switch and let that handle intensive bits of code, but python?! Not to mention that all the while RProvider was working just fine for a few years, no such TypeProvider for python really took off. Microsoft has already invested heavily in R gobbling up Revolution Analytics for a hefty price and rebranding it as Microsoft R distribution and adding the ability to directly script in R within SQL Server, before doing the same for python.

Now, through ML.NET, Math.NET, and Accord.NET, you get most of what you need from a Machine Learning perspective and they appear to be actively maintained. The problem with all, including the example included above by @zyzhu, is the awkwardness and verbosity.

Again, assuming that we have a data frame scores containing variables score, age, sex. In R, you would do:

model <- lm(data = scores, score ~ age * sex)

and then, from this model object, you can extract whatever you need, including statistics, coefficients and confidence intervals, error estimates, etc, even diagnostic plots, with some pretty intuitive names.

To me, doing the same thing as above and almost perfect in F# would go like:

    let model = 
        let data = scores
        let response = [ "score" ]
        let predictors = [ "age"; "sex" ]

        (data, response, predictors)
        |> linearModel ModelType.OLS CrossEffects.Multiplicative

with model object perhaps being a record type with fields corresponding to coefficients table, error estimates, basic statistics, etc.

siavash-babaei commented 3 years ago

Looking at it from a business perspective. F# was a primarily Windows thing up to now. Even though open-source, it was not properly supported on Linux where a lot of open-source community resides. With .Net 5.0 and F# 5.0, things have changed and now .Net is properly multiplatform, although tooling in Linux I suppose could still go some way. So it is almost like a new start, with the opportunity to expand both the language and the userbase. Something of noteworthy attention is the economic principle of competitive advantage. Basically how entities from nations to corporations to life itself stick to their strengths to survive and grow. Say, Websharper or SAFE Stack: absolutely necessary for a modern language but have they really made a dent in penetrating current market share? I don't think even Typescript is making any significant headways in attractive Javascript users or new ones. In my opinion, for whatever product, you would require a few killer features that would make it indispensable, and for F#, it could easily be the entire data analytics and data science workloads. The same thing that greatly helped propel python to the front. The user base, especially, being more mathematically inclined and comfortable with the syntax (I just love/adore it but dunno why but makes lots of people uncomfortable), ideas of immutability and the core of language being input -> function -> output, would be much better adopters than say, developers active in GUI or web. There are other areas I am sure, for example, business applications that fit nicely with Domain-Driven Design. But data science workloads - incidentally, a perfect match for DDD - are certainly worth the investment, especially as they seem to be exponentially growing both in volume and utilisation. If you think about it, one of the most active open source big data projects, Spark, is only 7 years old. The community seems to be more-so accepting of new tech that makes their life easier.

dsyme commented 3 years ago

There are many questions being discussed here. Let's just deal with the question of FsLab and its pieces.

Here are my opinions:

FsLab certainly needs to be taken down and/or revamped on .NET Core only and/or wound up as a "one-stop shop technology". That will create space for better approaches I think. I'm open to suggestions but we need to rethink things.

Note I'm not interested in discussing this from a "future of F#" perspective (this has nothing to do with F# and web programming, for example) but rather just practical steps to get things cleaned up on on a good sustainable coherent basis going forward

siavash-babaei commented 3 years ago

I added some notes and thoughts that seemed more appropriate to FsLab as a whole in https://github.com/fslaborg/FsLab/issues/137. I hope they are helpful, certainly don't mean to be criticising or anything ....

dsyme commented 3 years ago

Cool let's discuss in https://github.com/fslaborg/FsLab/issues/137

siavash-babaei commented 3 years ago

Guidance for Newbies:

Suppose a person with decent working knowledge of both R and F# wants to kinda restart this RProvider project. So what steps should be taken and what should be learnt, before attempting to update/fix it so it works with say, the latest version Microsoft R Open as a stable LTS version. I checked an intro to type provider design on MSDN, examples didn’t make much sense regarding interop with a different language.

siavash-babaei commented 3 years ago

I am wondering what has fundamentally changed since R 3.4 that RProvider no longer works after that version. Is it an issue of updating used libraries and packages from .Net 4 to .Net 6 or something in R API has completely changed!!! Given that R is an almost 40 year old language that has not changed much at core at least as far as users are concerned …

hmansell commented 3 years ago

@siavash-babaei I was the original author but haven't kept up with developments in the .NET community the last few years. Hopefully the following will be helpful:

AndrewIOM commented 3 years ago

I'm going to close this issue, as we have just released the v2.0.0-beta nuget package. Hopefully this should address the issues raised in this thread. Also see #218 for discussion about project maintainance and contribution guidelines. Thanks!