Add a script to automatically merge multiple .csv files and deal with duplicates

We need a dedicated tool to merge merge multiple .csv files while detecting and merging duplicates.

I've started to implement it through a new static method of DeviceCarbonFootprint:

@staticmethod
    def merge(device1: 'DeviceCarbonFootprint', device2: 'DeviceCarbonFootprint',
              conflict: Literal['keep2nd','interactive'] = 'keep2nd', verbose: bool = False) -> 'DeviceCarbonFootprint':

and a merge_csv.py file1 file2 standalone script written on top of the above merge function.

By default, priority is given to device2/file2.

Conflicts are detected only for attributes that provided for both devices and when they are clearly different. If they are close enough, then merge only print a warning in verbose mode.

Then, there are two modes to resolve the conflicts:

Simply keep device2 (and print the differences in verbose mode)
Ask the user which version should be kept.

TODO:

Add a non-regression mode only testing that device2 is consistent with device1 and that device1 does not contain more information.
Cleanup and unify some entries prior to fusion to avoid false negative (i.e., CN versus China, issue #64)
Find a way to deal with PCF files reporting the same model name whereas they are not the same (in ecodiag I also extract the model name from the main html files)

PYTHONPATH=. python tools/merge_csv.py boavizta-data-us.csv dell.csv -o /dev/null ------------------------------------------------------------ | Summary report | ------------------------------------------------------------ Number of singletons: 1235, 26 Number of self duplicates: 174, 2 Number of clean fusions: 455 Number of mixed fusions: 42 Number of attributes gathered from the oldest data: 122 ------------------------------------------------------------

Boavizta / environmental-footprint-data

Add a script to automatically merge multiple .csv files and deal with duplicates #65