Merging datasets - Githubissues

nadiagorchakova commented 7 years ago

When working with data coming from different sources, it's necessary to join several datasets into one.

The easiest case is when the two tables have a one-to-one relationship. For instance:

(The country column which is common for both datasets is present only once in each dataset.)

The more difficult case is when two tables have one-to-many relationship. This will be a common case when merging Flow Registration with Flow Monitoring forms. For instance:

A few examples how other tools handle one-to-many relationship in merged datasets:

1. Do an aggregation (sum, average, etc). This would result in the following merged dataset:

Downside of this method is that you cannot present and analyse trends over time as in time-series.

Create repetition inside the merged table

The common merge steps that several other tools seem to use:

select the tables that should be merged
select the merge column or 'source of match' (for instance, an Identifier or country, or both.. some tools have an option to merge on several matching columns)
merge

Merging would normally result in a new dataset (no overwriting of the merged datasets themselves)

I think we need to hold a workshop to understand the issue better together.

nadiagorchakova commented 7 years ago

Talked with Karolina about the usual practice of merging datasets. She suggested we could take a similar approach to the JOIN function in SQL:

Source: http://www.dofactory.com/sql/join

janagombitova commented 5 years ago

Implemented.

akvo / akvo-product-design

Merging datasets #260