dotnet / fsharp

The F# compiler, F# core library, F# language service, and F# tooling integration for Visual Studio
https://dotnet.microsoft.com/languages/fsharp
MIT License
3.84k stars 774 forks source link

What would a Data Science sku of VS look like? #1123

Closed KevinRansom closed 8 years ago

KevinRansom commented 8 years ago

I don't do Data Science ... I hear from lots of people I trust that F# is good for doing Data Science. If we were to propose a packaging for VS for doing Data Science what would it need to have in it?

VS is extremely configurable, and it is possible to add high quality ... well supported 3rd party components into the dynamic installer feed. That's how Xamarin for VS is delivered for example, although Xamarin is now 1st party.

I imagine that VS for Data Science would need:

What else? Python?, R tools for VS?

I would like to see whether we can describe an ideal packaging of F# for DataScience, what we would need to add what samples are needed and what the documentation would look like? Is this even a worthwhile activity? should we expend our effort on something else?

Kevin

tpetricek commented 8 years ago

Better tooling for data science is definitely one of the goals of this Ionide work: https://github.com/ionide/ionide-fsi/pull/24, which lets you add HTML printers to FSI and embed outputs of commands as HTML.

Ultimately, the goal there is to enable nice integration with the FsLab charting libraries, R type provider and possibly other things (see www.fslab.org).

This is not directly answering the question, but I thought I'd share the link here for inspiration - I'm pretty sure the same thing could be done in VS.

ionide

jamessdixon commented 8 years ago

I agree with F# OOB with fslab Having OOB R and Python language integration would also be good Having some project templates for experiments (like on AzureML) in all 3 languages (F#, R, Python) Having Accord.NET installed? If you really want to go crazy, but the AzureML .dlls on the desktop too :-)

ctaggart commented 8 years ago

So, I figured out how to automate the installation of Visual Studio Code on a Docker image with VSCode extensions like Ionide installed. My first one is packaging a golang-vscode. It is almost working and I'll blog about it when it is, but I'd like to build one that targets F# for Data Science too. My intentions were to put together a dotnet-vscode with a collection of extensions from https://marketplace.visualstudio.com/vscode when dotnet core rc2 finally is released.

KevinRansom commented 8 years ago

@ctaggart nice work. @tpetricek Yes Ionide is indeed awesome... what does nice integration with fslab charting libraries and the R typeprovider look like. Is it templates ... or samples or preinstall ...

@jamessdixon So do the templates for experiments already exist? What would productising them look like?

Is there any missing software ... pieces that just don't exist anywhere in the ecosystem? or does it just need collecting together in a single packaging and productizing?

The goal here is to identify if there any actions we can take to put this together ... perhaps come up with a plan and perhaps start work on it, although if VSCode + Ionide + Docker + FsLabs is sufficient then life is good and we can focus on something else that needs addressing.

mathias-brandewinder commented 8 years ago

I think the most blatant hole currently is having a way to easily create charts and visualize data. I am not entirely certain what the right solution/approach is at that point.

drcrook1 commented 8 years ago

I vote a pre-installed environment with FSLab, Xamarin, Asp.net templates for F# and entity framework. This would enable full stack development in F# with adhoc visualizations. There is a need to have a ML library that is wrapped in an F# idiomatic way as well. My vote is for CNTK or Tensor Flow. Links to contribution pages as the default popup would be really nice (FSLab as well as training materials for newbies).

To support other languages R and Python, you would need to include whatever is available to deliver full products in those languages as well as perform primary workloads expected in those languages. Example in R right click deploy to SQL would be pretty sweet. Python would likely be flask (or django unfamiliar in python) templates and all packages that are common in those workloads.

Also consolidating the send to interactive button would be nice, be it ctrl-enter or alt-enter or whateve ryou want, as long as its only 2 buttons.

Krzysztof-Cieslak commented 8 years ago

I'm obviously biased, and I'm not doing lot of data science but will allow myself to express opinion.

If you want to spend resources / time on Data Science tooling for F# I would go with making Ionide better instead of investing into VS.

Pros:

  1. Cross platform support (what might be important given popularity of Macs in DS community [no stats, just my feeling from following stuff on web ])
  2. Lightweight editor (VS is great for people coming from .Net / C# world, but lot of DS folks have different background. Forcing someone who is used to text editor / light IDEs to install big heavy IDE might be annoying. It's 100MB vs couple of GBs)
  3. More flexibility ( VSCode and Atom let users to easily mix stuff, nothing stops anyone from having markdown, latex, F# scripts, R files , and Python stuff in one project)
  4. Flexible extensions system ( especially Atom, but also VSCode have very good and flexible extensions APIs which lets us to really quickly add new features [like shown above by Tomas new FSI panel in Atom])
  5. Community driven ( all used libraries which should be used [FSharp.Data, FsLab, Accord, MBrace etc ] are created by community anyways, we could go full ahead with it and also use editor driven by community)

Cons:

  1. Project file UX ( we still struggle to provide good UX for handling fsproj files - right now they require by hand editing. But it can be change with integrating Forge)
  2. Debugging story ( Atom doesn't support debugging at all [and would not expect it to get any extension API for that anytime soon ]. VSCode support F# debugging only on mono. It's fixable [by creating debugger adapter for MDbg] but it's not trivial)
  3. Lack of any Azure integrations ( what might be important from MSFT point of view)

Summing up: From the beginning of Ionide me and Tomas have believed that data science is great niche for Ionide - it has been expressed by stuff like specific FsLab integrations or this new FSI panel which gives notebook style user experience. And we accept PRs ;)

KevinRansom commented 8 years ago

@Krzysztof-Cieslak

I think VSCode could easily be a good basis for a data science toolkit based on F#. I imagine that for the most part the list of things it needs would closely match VS.

Forgive my ignorance of the VSCode ecosystem, however, does anyone think that the ability to have access to in-IDE Python and a Python Repl is interesting, I know that VS has python tools for VS, I'm not familiar with VSCode? Similar for R ... there is a VS integration of R ... would that be interesting for a Data science workbench? and if so ... is there an equivalent tool for VSCode?

Or do we think that those tools are not useful?

smoothdeveloper commented 8 years ago

It is true that many Data Scientists are used to lighter weight tools, but there is probably a good part of them who wouldn't mind having a well integrated tool with everything setup and streamlined experience.

http://jupyter.org/ http://beakernotebook.com/

forki commented 8 years ago

I think from FsLab standpoint, support in VS works already ok (if you don't use nuget since that one have sometimes issues to resolve the FsLab dependencies).

Things that are missing:

Microsoft invested a HUGE amount of money into bringing R in VS. So maybe some of the excellent R Studio GUI tools will come as well.

All the other things are done in open source projects and FsLab handles orchestration and via paket the update of these tools/libs.

forki commented 8 years ago

@smoothdeveloper regarding streamlined experience: that's why they move to jupyter and co - that's where actual scientists work and influence the design. VS is coming from a completely different background and doesn't really fit that model. Every scientific project is different and many use a multitude of tools. Jupyter is one way to bring all that stuff into one environment. That's a great approach. It's not completely different mindset to type providers that allow you to integrate everything (well if we had more tps) into the fsx context.

So I think it's super important to build a jupyter kernel for F#. And the other way around to build the python TP. And even more important wrap all the Fortran libs from numpy and scipy into a lib rthat can be called from F#. That would make a huge difference.

lambdakris commented 8 years ago

I really like the idea of a Visual Studio Data Science sku featuring F#. I think an essential intermediary or foundational step would be to work on a Notebook/Workbook/Worksheet like experience for F#. Examples of this in other langs and/or editors include C# Workbooks in Xamarin Studio, Scala Worksheets in ScalaIDE/Eclipse, and of course, Jupyter.

Here are a couple of links to past conversations that have taken place about this subject:

  1. https://github.com/ionide/ionide-fsharp/issues/104
  2. https://github.com/fsprojects/VisualFSharpPowerTools/issues/185

Here is a link to the IfSharp project which was a F# profile for the old version of Jupyter, back when it was known as iPython Notebook:

  1. https://github.com/fsprojects/IfSharp
lambdakris commented 8 years ago

Oh, the reason why I think it is an "essential intermediary or foundational step" is because it provides a general (not just Data Science) boost to developer joy and productivity by shortening the feedback loop. Just imagine what it would feel like working with F# Data, SwaggerProvider, RProvider, Deedle and/or Spreads, and MBrace (all of which is already quite compelling on good ol' FSI). Your work would be almost entirely limited to modeling and algorithmic design/analysis! This in an type-safe, expressive, functional language!!!

forki commented 8 years ago

@lambdakris regarding your last comment - where do you think would VS it into that picture?

lambdakris commented 8 years ago

@forki Well, really it is just my attempt to express why I think a Visual F# Notebook-like experience is to be emphasized before other Data Science supporting features for VS. Ultimately a Visual F# Notebook experience in Visual Studio is what I'm advocating.

KevinRansom commented 8 years ago

Folks,

I think that VS would be beneficial because it would bring what is needed together into one package that many Enterprises would consider installing. Support for more than one language is beneficial because DS today is a polyglot endeavor. Having a consistent, low friction, polyglot, trusted packaging would enable documenters to create straightforward usage instructions for a number of common scenarios, and would enable community minded data science focused tools developers a vehicle to target their Data Science solutions, algorithmic libraries and services.

The idea is to pick something and make it great for the DS scenario, I know @mathias-brandewinder and @tpetricek have been advocating something like this for several years. But it has been quite slow to come together. I'm throwing the VS hat in the ring because it has a number of advantages related to acceptance and adoption --- but I am perfectly happy to support any alternative formulation --- wherever we go with this, we should all get behind it and push.

Kevin

haf commented 8 years ago

My wish list:

I don't think that F# devs are lacking more tooling. They are lacking cohesion in the F# Data Science story.

isaacabraham commented 8 years ago

@KevinRansom The problem isn't about Visual Studio or whatnot - having a good VS story will only get us so far. If Microsoft wants to help with the DS story, F# needs (as @haf rightly says) a coherent end-to-end story. This includes a modern ML library with good examples (ideally one that is distributable). More investment on Deedle. Having MS help build confidence in the developer community that, yes, you can use .NET for data science - currently I see MS pushing and investing in R and Python - both languages that have good DS stories to a point - but I don't get the impression that there's much awareness of the capabilities of F# for this story as well, as well as within Azure. The list goes on and on.

So yes, whilst getting some templates in VS would definitely be useful, tooling is just one element of this story - it can't be solved just with that.

dennislwarner commented 8 years ago

A useful offering would address a broader array of data science tasks. The scientific quantitative analysis enabled by the variety of tools mentioned above is usually the penultimate step in the acquire-sanitize-normalize-analyze-proselytize data science project cycle. Often the most time-consuming and expensive part of the process lies in the first three, or the second and third steps. The analysis and presentation work is often done by a smaller team of specialists and is rarely the binding constraint when planning/budgeting a data analysis project. A few suggestions: (1) Acquisition .. include built in processes for a wide variety of published data, along the lines of the F# data providers, and /or specialized R packages. For finance and economics links to FRED, Quandl, the IMF and World Bank, O.E.C.D. and several U.N. Agencies would be popular, for text-based analysis access to dictionaries and thesauruses (e.g. WordNet) , for merchandising access to Census decennial and other surveys. , Of equal importance would be the ability to easily import intermediate and final results from other systems, including but not limited to S,SPSS,SAS, Matlab, Wolfram, Julia,..et al ..... Sanitation....a.k.a. data munging.... provide illustrated guides and walk-throughs for exploring data and detecting and ameliorating errors and omissions. (shortcomings here always lessen the final product). Normalization...This is the lead in to the analysis. Each combination of subject area and analysis method impose distinct requirements. Here detailed examples and templates, specifically targeted to subject areas and methods . The great potential for a VS offering would be the ability to incorporate prior efforts and results by allowing the polyglot projects with R, Python, F# and other .NET modules.....

smoothdeveloper commented 8 years ago

Microsoft should really put efforts and push F# as the language of choice for Data Science (within Microsoft's eco-system) and scaling algorithms in the cloud (mbrace) and parallelization (hopac, streams):

(*) Dynamic languages are most commonly used in Data Science, and it causes lots of pain, I know people doing this type of work, and updating a library often involves hours of troubleshooting in the code due to lack of compile time checks, one issue at a time.

(*) Static typing gets in the way mostly when you have to state everywhere the types once or twice (like in C# and VB.NET, and this gets really noisy as soon as you have the slightest amount of generics), F# doesn't suffer from that.

Time for Microsoft to realize and invest efforts in F# making :

That is a good way to acquire a lot of Data Scientists as Visual Studio users :smile:

tastephens commented 8 years ago

I work in economics, and a lot of people use RStudio or Stata, sometimes together with Python, for data analysis. These tools have their strengths and weaknesses, but as a language, F# is lightyears ahead of them, especially with the VS integration for intellisense, syntax checking, debugging, etc. I could imagine F# in Visual Studio replacing RStudio or Stata (the ability to access R via type providers is pretty big), but it has to be adapted to the right workflow.

Economists working with RStudio or Stata have a very different workflow from programmers. They don’t sit down and start writing code from a project template (much less from scratch). They open Stata or RStudio and start pointing and clicking, typing interactive commands into the REPL or a combination of the two. It starts with loading/downloading data, and doing exploratory analysis, which is gradually refined over time and eventually consolidated into scripts that are used to load and clean/label data, to run statistical analyses, to generate tables and figures in various formats, etc.

Probably the single best feature of Stata is that it has a rich GUI that can be used almost like Excel, as well as a Command window, and both interfaces interact with the same REPL core. Whether you type a command into the Command window or point and click, the resulting command line (as typed or generated via the GUI) is printed in the output window and logged in the Review window. Beginning users who prefer to point and click can often avoid the REPL entirely, but the results are exactly the same, and can be exported as a script.

The Review pane contains a history of every command executed in the session, but not output. Any command in the history can be re-run with a double-click, and part or all of the history can be exported as a script (‘do file’), which can be run by any other Stata user. That means that everything a user does in a Stata session can be exactly replicated by anyone else. This is really important. A GUI user can even open the Excel-like Data Editor and start typing values into cells, and every single entry will be translated into Stata commands that appear in the output and in the Review pane. The value that adds is enormous compared with, say, Excel, where you never really have any idea what someone has done, or a typical IDE like VS where you really have to start by writing code, creating a barrier to non-programmers coming from, say, Excel.

I think a Data Science version of VS (VS/DS) would have to be centred on a REPL, in much the way that Stata is. VS has the Command Window pane, but it’s pretty useless. You have to type long, tedious menu commands rather than concise commands, and it really feels like an afterthought. To my knowledge, there isn’t even a way to save the session history. Moreover, activity done with the GUI is completely separate. It’s chaotic compared with the unified environment offered by Stata.

Rather than trying to improve the existing Command Window in VS, it would make much more sense to make an enhanced FSI the heart of VS/DS, with F# libraries exposing all of the features of the VS GUI, especially those related to data analysis. These libraries could be loaded automatically at startup, allowing command-line users to ignore the GUI entirely and just start typing into the REPL (which is what more experienced Stata users typically do).

Following Stata, I think FSI should be split into three panes/windows, an input pane, an output pane and a review (command history) pane. Ideally, that would be combined with GUI hooks, like in Stata, so that anything that is done with the GUI would be translated into FSI commands and appear in the output and review panes. The result would be an environment with a low barrier to entry, helping beginners get started, but one where the GUI wouldn’t get in the way of experienced users, or those with programming backgrounds.

TLDR: Look at how the Stata (http://www.stata.com/) GUI and REPL are linked together, and build something similar in VS, with an enhanced FSI as the REPL at the core.

gerardtoconnor commented 8 years ago

It would be very nice to have a FSEye style object monitor built into VS & FSI so you can always monitor objects currently in memory and potentially have context menus to chart/export. When working in MATLAB/Octave, you are always able to see all your currently loaded objects/values and this is really useful for Data Science, FSEye kind of does this but to have it built in with extensive functionality would be great. Inclusion of R is useful but if that can be built in via RProvider then your set anyway ... unless there is magically going to be a proper type-checking engine in the Python version .. I would have no interest!

smoothdeveloper commented 8 years ago

@gerardtoconnor on same ground about monitoring objects, make debugger visual inspectors available in F#, they only work in C# and VB.NET for some reason.

mathias-brandewinder commented 8 years ago

A quick thought: one of the ways to look at the question is, which 'style' do people want? I can see two existing approaches:

Visual Studio is currently closer to (1), but the output window (FSI) is extremely crude. What @tpetricek showed with Code+Ionide is close to what I prefer, but I am not sure how this would look in VS.

forki commented 8 years ago

We need to bribe @nosami to bring xamarin F# repl to VS ;-)

tastephens commented 8 years ago

@mathias-brandewinder I prefer a REPL/scripting environment, ideally with the input pane separate from the output pane(s)/window(s), as in Stata (command lines are still echoed to the output pane, but not typed into it, with other types of output going to other panes/windows) but not RStudio (text output appears in the same pane you type commands into, but other output appears in other panes).

With the REPL/scripting approach at the core, it probably wouldn't be too difficult to allow different settings for input and output. You could do everything in an HTML-based 'notebook' pane that sends commands to the REPL and then displays the rich output inline, or have separate panes for input, text output, documentation, HTML output, plot output, etc. The critical thing is that everything should ultimately be translated to commands sent to a REPL (in the open or behind the scenes), so the session (not just the final output) can be saved and replicated.

drcrook1 commented 8 years ago

@forki on this note, I need a way to surface my data science into modern applications, I'm trying to do this through Xamarin, however there are a few key items that I am finding don't work. Investments in FSLab to ensure everything exposed there can run in Xamarin and .Net Core would be incredible.

Beyond making FSLab work in Xamarin/.net core, I think the power of data science languages comes from the packages, primarily the visualization, data ingestion/manipulation and machine learning frameworks provided. I think to be truly successful, the major thing we are lacking in for the Microsoft platform is hitting modern app targets with all of those peices. The tooling is something that can be worked on, but dude, I can't use my code on device, because most if not all of the libraries are not able to target these things.

I think if those baseline packages can really get nailed in a good way, you have the framework for building and delivering amazing products which have data science at the core of them. This is the direction I believe the industry is going to go and is the way I am moving, but I find myself blocked at various points because of these limitations.

For the tooling, I think most everybody is on par with what I'm after as well, great REPL experience, various windows and plots, but important for me also, is being able to easily convert that into a production library that can be consumed by a modern client application.

I'm of the opinion the tooling is in a decent place right now, the gap is the libraries and the platforms they can target.

haf commented 8 years ago

I'm of the opinion the tooling is in a decent place right now, the gap is the libraries and the platforms they can target.

Agree with @drcrook1 on this.

Krzysztof-Cieslak commented 8 years ago

@KevinRansom

Python support in VSCode is top class, don't know about R... but not sure if we should focus on R given R TypeProvider.

The question is: If you want to create SKU for VS (as in title), or good F# Data Science experience in general. I will focus on second case (as I believe it's better for F# in general). In such case locking in VS (and thus Windows) is very bad idea. There are many place you could contribute - x-plat editors (arguably, that's easiest thing and community can handle it) and whole set of libraries (others people in this thread are more qualified to say exactly what should be improved)

As others mentioned it's also important decision what style of editor we want for DS (or maybe we want / need all?) - inlining output in editor, rich REPL where you can type, or maybe something like what we have currently in Atom + Ionide (read-only [but rich] FSI panel and communication only by send line/selection/file)

tastephens commented 8 years ago

@Krzysztof-Cieslak

To what extent do you think improving the DS experience could be done with F# libraries that could be used in both VS on Windows and VSCode/Atom on other platforms? I assume VS uses XAML for UI, whereas the others use HTML, but if all of the underlying code is F# without dependencies Win32-only libraries, maybe it would be possible for DS libraries that deal with UI to support both kinds of rendering?

Both a lightweight, cross-platform experience based on VSCode, and a more advanced, heavyweight experience based on VS seem useful. As a VS user, I have to say that Atom has a long way to go to be comparable in terms of features, usability, performance and robustness, and VSCode has even further to go in many areas (text rendering on Windows is so bad as to be barely useable – I don't know about on other platforms). If VSCode could match essential features, usability and robustness of VS, and remain lightweight and cross-platform, it would be great complement to VS, but I think both will remain useful.

I really like the way Ionide is going, and hope the VSCode experience can eventually catch up with the Atom experience. I also agree that it's important for the F# community that things are open source and cross-platform. At the same time, VS is a mature, robust and incredibly powerful and useful IDE. If it offered the same sort of REPL-centric experience as RStudio or Stata, coupled with its current capabilities and the ability to target all the major platforms, plus access R packages, it could be an amazing DS tool.

simra commented 8 years ago

Lots of good feedback here so I won't repeat what everyone else says, just a quick comment that there's a functioning Jupyter branch in the official IFSharp repo.

ctaggart commented 8 years ago

As I mentioned earlier, I like the idea of bundling a bunch of extensions and software with VSCode in a Docker image. You can run the image on Windows, Linux, Mac, in Azure or another cloud, or your own datacenter. The shell and GUI are still accessible locally no matter where you run it. It should take less than 10 minutes to set it up. Here is a video I created demonstrating the same concept with Rust extensions. It runs the image on a machine on Azure, but displays it locally on my Windows 10 laptop. https://twitter.com/cmr0n/status/729508355144716288

With VSCode, we can ship F# again in the box: image That is a screenshot of running the first build from https://github.com/ctaggart/dotnet-vscode. All those extensions get prepackaged.

I'm not making any promises about the current stability of dotnet-preview:latest. I'm really looking forward to RC2!

Here are my quick notes (a blog is coming) about running this with Docker for Mac:

in one terminal tab

docker run --rm -it -p 6100:6100 ctaggart/dotnet-vscode
. start-xpra.sh
code .

in another terminal tab

export PATH=/Applications/Xpra.app/Contents/MacOS:$PATH
xpra attach tcp:127.0.0.1:6100
Krzysztof-Cieslak commented 8 years ago

Just want to do small update with our current state of the art in Ionide image Source: https://twitter.com/fslaborg/status/728397249697488896/photo/1 @tpetricek can give us more details about it.

It's currently working in Atom (as Tomas is using it) but I'm fairly confident we should be able to port this to VSCode.

ctaggart commented 8 years ago

I love the great progress! I'm really hoping that can be ported to VSCode. In VSCode, I like their debugging support and their openness with the public iteration plan linked from their insiders download page.

tpetricek commented 8 years ago

@ctaggart We're always looking for contributors :-). It is mostly a matter of porting the work from ionide-fsi (which is for Atom) to ionide-vscode-fsharp (which is for VS Code). Open an issue there to discuss what needs to be done!

drcrook1 commented 8 years ago

How does that play with the jupyter notebook project?

isaacabraham commented 8 years ago

@drcrook1 interesting that you mentioned that, because last week at MSR we had a good look at Jupyter with F# support - it's progressed quite far now and I'm hopeful that it will become a "mainline" branch very soon (https://github.com/fsprojects/IfSharp).

@ctaggart - @Krzysztof-Cieslak is also close to getting the Atom stuff into VSCode now. If only @tpetricek would fix the binding redirects :-P

drcrook1 commented 8 years ago

Can somebody send me a note the second you publish the VS Code and Jupyter goes mainline. Maybe shoot me some docs on the Jupyter notebook? I'm running an ML study group that uses F# and would love a lower barrier to entry and better experimentation framework for teaching.

Also, let me know what I can assign my students on. I should have ~3 students starting in August who will need to contribute to repos.

tpetricek commented 8 years ago

@isaacabraham This is the change that breaks everything. I have no idea why. Will try to figure out tomorrow, but if anybody wants to help... :confused:

isaacabraham commented 8 years ago

Yeah, it's a weird one. I did manually copy over the correct FSharp.Core, and it complained spectacularly when I didn't put in the correct BR - but then when I updated them to 4.4.0.0, the errors went away but still complained when sending to FSI about missing methods e.g. sortByDescending etc..

dsyme commented 8 years ago

Closing old discussion