Apply functions to row header

ArcticTee commented 6 years ago

Hi,

first of all, this is a great library, thank you for keeping this up! To enhance the functionality I would like to do the following:

I just tried to import a sheet which has a header row with duplicates like this:

test | test | 1
1 | 2 | 2
2 | 3 | 2

This will result in a json object like this:

[
  {"test": 2, "1": 2},
  {"test": 3, "1": 2},
]

I would like get the option to apply multiple functions to the header row to transform the column names when using the auto columnToKey mapping - maybe something like this:

const result = excelToJson({
    sourceFile: 'SOME-EXCEL-FILE.xlsx',
    columnToKey: {
        '*': '{{columnHeader}}',
                 'transformSingleKey':
                                              [
                                                 removeSpecialCharacters,
                                                 replaceNumberColumnNames,
                                              ]
                 'transformAllKeys':
                                              [
                                                 suffixDuplicates
                                              ]
    }
});

Where the transformSingleKey - Array will map over the columnnames and apply all functions

const cleandedUpCols = colnames.map(
      pipeFunctions(
        removeSpecialCharacters,
        replaceNumberColumnNames,
      ),
    );

And the transformAllKeys - Array will work the whole column array

suffixDuplicates(cleanedUp)

Helper functions:

const suffixDuplicates = (list) => {
  // Containers

  const count = { };
  let itemCount;
  return list.map((item) => {
    itemCount = count[item];
    itemCount = itemCount === null ? 1 : itemCount + 1;
    count[item] = itemCount;
    // starting with 2
    return itemCount > 1 ? `${item}_${itemCount}` : item;
  });
};
const removeSpecialCharacters = key => key.replace(/[^a-zA-Z0-9]/g, '');

export const replaceNumberColumnNames = key => (Number.isNaN(key) ? key : `c${key}`);

const pipeFunctions = (...funcsToApply) => (currentValue,
  index,
  arrayToMap) => funcsToApply.reduce(
  (accum,
    cFunc) => cFunc(accum, index, arrayToMap), currentValue,
);

Thank you

DiegoZoracKy commented 6 years ago

Hi @ArcticTee, thanks for bringing that case, and a solution which I believe is a very good one. And it seems to match with a suggestion that I've left open a few months ago. Take a look at this issue and tell me what you think of this approach:

https://github.com/DiegoZoracKy/convert-excel-to-json/issues/4

ArcticTee commented 6 years ago

Yeah, this option is fine as long as you know the name of your columns.

I am currently working on a general import of Excel files and I do not know the column format of that files when implementing it. The user of this import tool shall give the sheet name and if the file contains (and if so where the header row lies) in a GUI.

Therefore I would be happy if the enhancement for single keys and all keys for unknown columns would be added 👍

DiegoZoracKy commented 6 years ago

I believe that both ideas can be joined. In fact, I can see two different issues here.

The first one is to never let data be discarded due to multiple column headers with the same name. By default it should suffixDuplicates as you suggested, and this behavior could be disabled via config.

The second issue is about the interface to apply a pipeline of data transformations.

I believe that the first one is what will solve your current problem, am I right?

ArcticTee commented 6 years ago

I believe that the first one is what will solve your current problem, am I right?

Just partly. Because it is just one transformation our tool needs to do. We are using a MongoDB which accepts JSON inputs.

E.g. this suffixDuplicate will prevent data loss during the import. But it needs to look at all the keys, to find duplicates.

But another practical problem the pipeline/transformation would solve is that invalid column names (which can be checked by just looking at a single key) can be resolved. e.g. replaceNumberColumnNames will add a 'c' prefix to all columns that are just a number. For mongoDB this is necessary because they do not support column names that are just a number.

So for my use case it would be great if pipelines for single and all keys would be supported. The suffixDuplicates is just a function for our import tool, which this project can benefit from as well.

ArcticTee commented 5 years ago

@DiegoZoracKy Any updates on this? Let me know if I can help you.

DiegoZoracKy commented 5 years ago

I'm almost with no free time during this month. Probably, from next week onwards I'll be able to get back to this issue. The point is to define a good interface for us to configure the transformations.

On Tue, Nov 20, 2018, 8:38 AM ArcticTee <notifications@github.com wrote:

@DiegoZoracKy https://github.com/DiegoZoracKy Any updates on this? Let me know if I can help you.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DiegoZoracKy/convert-excel-to-json/issues/21#issuecomment-440226145, or mute the thread https://github.com/notifications/unsubscribe-auth/AB_oP6znGw42pj5r7EsMxfLFgOddHVsSks5uw9utgaJpZM4YSSPj .

ArcticTee commented 5 years ago

Yeah. I am happy you are on it.

tijevlam commented 2 years ago

hi there, since this has been a long time. Is there any news on this issue? In my case i would like to apply a camelCase function to the headerrow but applying a function to {{columnHeader}} results in undefined keys.

DiegoZoracKy / convert-excel-to-json

Apply functions to row header #21