fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
939 stars 197 forks source link

Improve docs, particularly for C# #308

Open adamklein opened 9 years ago

adamklein commented 9 years ago

One especially central operation that’s easy to forget and hard to discover is how to dice a frame on one of its indices, e.g., in C#:

frame.Rows[ … ]
frame.Columns[ … ]

While the formulation is elegant, I think the mental hurdle is that the operation is on a property of the frame, and not on the frame itself.

I think it's fine to keep, but we should promote this use in the docs.

I'm also thinking maybe to have a C# and F# cookbook (combined? separate?) that serves as a reference that's a bit more use-case oriented than the API reference, but not as didactic as the tutorials.

casbby commented 9 years ago

Problem solving oriented Cookbook (like the one for pandas) is a great idea!! I feel the current form of the Deedle documentation and function signature is a little intimidating to ppl like myself who is not experienced.

I completely agree on the comments on row based and column based calculation. Its very hard to have a clear mental picture how it works by just read the document. its a lot easier to actually just poking around with a simple cookbook style example.

The following is text copied from the an ticket i raised with suggestion on improving documentation.

In area like economics often time series are combined into newer time series (set based combination). When I read the Deedle C# API signatures and document page there is few mention on such application. however after some poking around its surprisingly easy to do such calculation. I feel it is even easier to do such process in C# than F# (sorry to say that).

The following is an example I hope i could have found on the documentation page:

CalculateDeedleFrameRowAverageWithMissingValues

Things to pay attention to are:

  1. The rows function didn't remove the row while both columns' value are missing
  2. The mean function is smart enough to calculate mean based on available data point.

Thanks for opening source such a great library

CalculateDeedleFrameRowAverageWithMissingValues

using System.Text;
using System.Threading.Tasks;
using Deedle;

namespace CalculateDeedleFrameRowAverageWithMissingValues
{
    class Program
    {
        static void Main(string[] args)
        {
            var s1 = new SeriesBuilder<DateTime, double>(){
                 {DateTime.Today.Date.AddDays(-5),10.0},
                 {DateTime.Today.Date.AddDays(-4),9.0},
                 {DateTime.Today.Date.AddDays(-3),8.0},
                 {DateTime.Today.Date.AddDays(-2),double.NaN},
                 {DateTime.Today.Date.AddDays(-1),6.0},
                 {DateTime.Today.Date.AddDays(-0),5.0}
             }.Series;

            var s2 = new SeriesBuilder<DateTime, double>(){
                 {DateTime.Today.Date.AddDays(-5),10.0},
                 {DateTime.Today.Date.AddDays(-4),double.NaN},
                 {DateTime.Today.Date.AddDays(-3),8.0},
                 {DateTime.Today.Date.AddDays(-2),double.NaN},
                 {DateTime.Today.Date.AddDays(-1),6.0}                 
             }.Series;

            var f = Frame.FromColumns(new KeyValuePair<string, Series<DateTime, double>>[] { 
                KeyValue.Create("s1",s1),
                KeyValue.Create("s2",s2)
            });

            s1.Print();
            f.Print();

            f.Rows.Select(kvp => kvp.Value).Print();

//            29/05/2015 12:00:00 AM -> series [ s1 => 10; s2 => 10]
//            30/05/2015 12:00:00 AM -> series [ s1 => 9; s2 => <missing>]
//            31/05/2015 12:00:00 AM -> series [ s1 => 8; s2 => 8]
//            1/06/2015 12:00:00 AM  -> series [ s1 => <missing>; s2 => <missing>]
//            2/06/2015 12:00:00 AM  -> series [ s1 => 6; s2 => 6]
//            3/06/2015 12:00:00 AM  -> series [ s1 => 5; s2 => <missing>]

            f.Rows.Select(kvp => kvp.Value.As<double>().Mean()).Print();

//            29/05/2015 12:00:00 AM -> 10
//            30/05/2015 12:00:00 AM -> 9
//            31/05/2015 12:00:00 AM -> 8
//            1/06/2015 12:00:00 AM  -> <missing>
//            2/06/2015 12:00:00 AM  -> 6
//            3/06/2015 12:00:00 AM  -> 5

            //Console.ReadLine();
        }
    }
}
NicoJuicy commented 2 years ago

Perhaps the Titanic example could also be co-written in c#? It's not exactly clear from the F# example, i wanted to do it as a quick way to test Deedle out, but it's not going to be a "quick test" as i expected.

If i succeed at creating it, i'll share it here.