carbon-design-system / carbon-charts

:bar_chart: :chart_with_upwards_trend:⠀Robust dataviz framework implemented using D3 & typescript
https://charts.carbondesignsystem.com
Apache License 2.0
913 stars 185 forks source link

Proposal: Dataset in Tabular Format #516

Closed caesarsol closed 4 years ago

caesarsol commented 4 years ago

Hi, this is a proposal for @theiliad, about a change in the shape of the data object given as chart input.

This is based on a pattern we often use at Accurat, which is to always start from non-nested datasets.

// Dataset with columns `x`, `y`, `category`

const tabularDataset = [
  { x: new Date(2020, 1, 1), y: 32100, category: 'A' },
  { x: new Date(2020, 1, 2), y: 23500, category: 'A' },
  { x: new Date(2020, 1, 3), y: 53100, category: 'A' },
  { x: new Date(2020, 1, 4), y: 42300, category: 'A' },
  { x: new Date(2020, 1, 5), y: 12300, category: 'A' },
]

We find that tabular datasets are, for the fact that they avoid nesting:

Follows two cases of conversions from old to new proposed format.

AS IS:

https://github.com/carbon-design-system/carbon-charts/blob/master/packages/core/demo/demo-data/line.ts#L81-L185

const lineData_AS_IS = {
  labels: ["Qty", "More", "Sold", "Restocking", "Misc"],
  datasets: [
    {
      label: "Dataset 1",
      data: [32100, 23500, 53100, 42300, 12300]
    },
    {
      label: "Dataset 2",
      data: [34200, 53200, 42300, 21400, 0]
    },
    {
      label: "Dataset 3 long name",
      data: [41200, 23400, 34210, 1400, 42100]
    },
    {
      label: "Dataset 4 long name",
      data: [22000, 1200, 9000, 24000, 3000]
    },
    {
      label: "Dataset 5 long name",
      data: [2412, 30000, 10000, 5000, 31000]
    },
    {
      label: "Dataset 6 long name",
      data: [0, 20000, 40000, 60000, 80000]
    }
  ]
};

const lineTimeSeriesData_AS_IS = {
  datasets: [
    {
      label: "Dataset 1",
      data: [
        { date: new Date(2019, 0, 1), value: 10000 },
        { date: new Date(2019, 0, 5), value: 65000 },
        { date: new Date(2019, 0, 8), value: 10000 },
        { date: new Date(2019, 0, 13), value: 49213 },
        { date: new Date(2019, 0, 17), value: 51213 }
      ]
    },
    {
      label: "Dataset 2",
      data: [
        { date: new Date(2019, 0, 2), value: 0 },
        { date: new Date(2019, 0, 6), value: 57312 },
        { date: new Date(2019, 0, 8), value: 21432 },
        { date: new Date(2019, 0, 15), value: 70323 },
        { date: new Date(2019, 0, 19), value: 21300 }
      ]
    },
    {
      label: "Dataset 3",
      data: [
        { date: new Date(2019, 0, 1), value: 50000 },
        { date: new Date(2019, 0, 5), value: 15000 },
        { date: new Date(2019, 0, 8), value: 20000 },
        { date: new Date(2019, 0, 13), value: 39213 },
        { date: new Date(2019, 0, 17), value: 61213 }
      ]
    },
    {
      label: "Dataset 4",
      data: [
        { date: new Date(2019, 0, 2), value: 10 },
        { date: new Date(2019, 0, 6), value: 37312 },
        { date: new Date(2019, 0, 8), value: 51432 },
        { date: new Date(2019, 0, 15), value: 40323 },
        { date: new Date(2019, 0, 19), value: 31300 }
      ]
    }
  ]
};

TO BE:

The proposed data format is basically the same shape of a CSV converted to JSON.

const lineData_TO_BE = [
  { group: "Dataset 1", y: 32100, x: "Qty" },
  { group: "Dataset 1", y: 23500, x: "More" },
  { group: "Dataset 1", y: 53100, x: "Sold" },
  { group: "Dataset 1", y: 42300, x: "Restocking" },
  { group: "Dataset 1", y: 12300, x: "Misc" },

  { group: "Dataset 2", y: 34200, x: "Qty" },
  { group: "Dataset 2", y: 53200, x: "More" },
  { group: "Dataset 2", y: 42300, x: "Sold" },
  { group: "Dataset 2", y: 21400, x: "Restocking" },
  { group: "Dataset 2", y: 0, x: "Misc" },

  { group: "Dataset 3 long name", y: 41200, x: "Qty" },
  { group: "Dataset 3 long name", y: 23400, x: "More" },
  { group: "Dataset 3 long name", y: 34210, x: "Sold" },
  { group: "Dataset 3 long name", y: 1400, x: "Restocking" },
  { group: "Dataset 3 long name", y: 42100, x: "Misc" },

  { group: "Dataset 4 long name", y: 22000, x: "Qty" },
  { group: "Dataset 4 long name", y: 1200, x: "More" },
  { group: "Dataset 4 long name", y: 9000, x: "Sold" },
  { group: "Dataset 4 long name", y: 24000, x: "Restocking" },
  { group: "Dataset 4 long name", y: 3000, x: "Misc" },

  { group: "Dataset 5 long name", y: 2412, x: "Qty" },
  { group: "Dataset 5 long name", y: 30000, x: "More" },
  { group: "Dataset 5 long name", y: 10000, x: "Sold" },
  { group: "Dataset 5 long name", y: 5000, x: "Restocking" },
  { group: "Dataset 5 long name", y: 31000, x: "Misc" },

  { group: "Dataset 6 long name", y: 0, x: "Qty" },
  { group: "Dataset 6 long name", y: 20000, x: "More" },
  { group: "Dataset 6 long name", y: 40000, x: "Sold" },
  { group: "Dataset 6 long name", y: 60000, x: "Restocking" },
  { group: "Dataset 6 long name", y: 80000, x: "Misc" },
];

const lineTimeSeriesData_TO_BE = [
  { group: "Dataset 1", x: new Date(2019, 0, 1), y: 10000 },
  { group: "Dataset 1", x: new Date(2019, 0, 5), y: 65000 },
  { group: "Dataset 1", x: new Date(2019, 0, 8), y: 10000 },
  { group: "Dataset 1", x: new Date(2019, 0, 13), y: 49213 },
  { group: "Dataset 1", x: new Date(2019, 0, 17), y: 51213 },

  { group: "Dataset 2", x: new Date(2019, 0, 2), y: 0 },
  { group: "Dataset 2", x: new Date(2019, 0, 6), y: 57312 },
  { group: "Dataset 2", x: new Date(2019, 0, 8), y: 21432 },
  { group: "Dataset 2", x: new Date(2019, 0, 15), y: 70323 },
  { group: "Dataset 2", x: new Date(2019, 0, 19), y: 21300 },

  { group: "Dataset 3", x: new Date(2019, 0, 1), y: 50000 },
  { group: "Dataset 3", x: new Date(2019, 0, 5), y: 15000 },
  { group: "Dataset 3", x: new Date(2019, 0, 8), y: 20000 },
  { group: "Dataset 3", x: new Date(2019, 0, 13), y: 39213 },
  { group: "Dataset 3", x: new Date(2019, 0, 17), y: 61213 },

  { group: "Dataset 4", x: new Date(2019, 0, 2), y: 10 },
  { group: "Dataset 4", x: new Date(2019, 0, 6), y: 37312 },
  { group: "Dataset 4", x: new Date(2019, 0, 8), y: 51432 },
  { group: "Dataset 4", x: new Date(2019, 0, 15), y: 40323 },
  { group: "Dataset 4", x: new Date(2019, 0, 19), y: 31300 }
]

Problem 1: Column names

The two examples above use as columns, or object keys, the strings group x y, and some (y2, color) could be added in the future. But as @theiliad pointed out, the Carbon Charts are more generic and they don't rely on the fact that the horizontal axis is called X and the vertical Y.

Another option could be to let the user specify the "axis - column" association as a configuration option. This would mean the dataset could have columns [date, country, value] and the configuration option could assign them:

axesColumns: {
  bottom: 'date',
  left: 'value',
  group: 'country',
}

Problem 2: Backwards compatibility

We think that backward compatibility could be obtained by supporting both formats for a limited period of time, transforming one into the another.


Many things are still to be defined and we have many ideas, we are curious if anyone has any thoughts!

(cc @lucafalasco @serenaG @ilariaventurini)

theiliad commented 4 years ago

I'll loop in a couple of people to comment. @zvonimirfras @cal-smith @tw15egan

My position is it's a good idea to support a tabular format of data and I've been waiting to see what Accurat would propose, however I don't see a significant advantage with the proposed data format.

I'm not a fan of the x, y, x2, y2 idea since it doesn't clearly define where the data would land (is y left and y2 right? what if we have an RTL chart?)

cal-smith commented 4 years ago

I definitly prefer the tabular format over the current format - to my mind it's significantly clearer.

I agree with

I'm not a fan of the x, y, x2, y2 idea since it doesn't clearly define where the data would land (is y left and y2 right? what if we have an RTL chart?)

However we can sort that with some mapping options/functions ... something to the effect of:


data = [
    { supplier: 'foo inc', y: 66 }
    { supplier: 'bar corp', y: 25}
    // ...
];

// in the options config
axis: {
    bottom: {
        map: (data) => data.supplier,
        // ...
    },
    left: {
        map: 'y',
        // ...
    }
}

so the type would look like map: string | (data) => any. Should we need grouping or other values to order by, we can use the same type signature.

It should be fairly easy to write a function to map between the old format and a tabular format ... worst case it may also be feasible to support both.

caesarsol commented 4 years ago

@theiliad I've reported what you say under Problem 1, sorry if I wasn't clear! About the "non significant advantage", you may have a point. However, let me add to the pros the simplicity to add a three-dimensions chart such as the Heatmap, or the continuous-color-coded Scatterplot. I think the nested data structure for those could be pretty complicated.

@cal-smith thanks! I agree with you that we could also make function-based accessors, it's a very common pattern in lodash so it definitely makes sense. The only advantage I can think of using strings over functions is that they are JSON-serializable, but I don't know if that's something of importance for you.

cal-smith commented 4 years ago

The only advantage I can think of using strings over functions is that they are JSON-serializable, but I don't know if that's something of importance for you.

Definitely want to keep the possibility of JSON serialization, but there's no reason we can't support both strings and functions 👍

theiliad commented 4 years ago

Done a long time ago