akvo / akvo-product-design

Products Design Documents
GNU Affero General Public License v3.0
12 stars 9 forks source link

Merging datasets #260

Closed nadiagorchakova closed 5 years ago

nadiagorchakova commented 7 years ago

When working with data coming from different sources, it's necessary to join several datasets into one.

The easiest case is when the two tables have a one-to-one relationship. For instance:

screen shot 2017-04-21 at 15 56 45

(The country column which is common for both datasets is present only once in each dataset.)

The more difficult case is when two tables have one-to-many relationship. This will be a common case when merging Flow Registration with Flow Monitoring forms. For instance:

screen shot 2017-04-21 at 15 59 45

A few examples how other tools handle one-to-many relationship in merged datasets:

1. Do an aggregation (sum, average, etc). This would result in the following merged dataset:

screen shot 2017-04-21 at 16 06 28

Downside of this method is that you cannot present and analyse trends over time as in time-series.

  1. Create repetition inside the merged table
screen shot 2017-04-21 at 16 20 16

The common merge steps that several other tools seem to use:

Merging would normally result in a new dataset (no overwriting of the merged datasets themselves)

I think we need to hold a workshop to understand the issue better together.

nadiagorchakova commented 7 years ago

Talked with Karolina about the usual practice of merging datasets. She suggested we could take a similar approach to the JOIN function in SQL:

screen shot 2017-06-09 at 16 28 07

Source: http://www.dofactory.com/sql/join

janagombitova commented 5 years ago

Implemented.