GlobalDataverseCommunityConsortium / dataverse-previewers

A collection of Datafile Previewers that can be configured to work with Dataverse
MIT License
13 stars 39 forks source link

Files without row header have shifted data #14

Closed qqmyers closed 5 years ago

qqmyers commented 5 years ago

In files without a row header, the previewed data gets shifted (with a quick look - it looks like column 1 appears many columns over and subsequent columns are shifted).

For the following .tab file (extension changed to .txt to allow upload here), the display shows [Gherghina & Katsanidou (2013DATA) Data Availability in Political Science Journals_33.txt] (https://github.com/QualitativeDataRepository/dataverse-previewers/files/3703728/Gherghina.Katsanidou.2013DATA.Data.Availability.in.Political.Science.Journals_33.txt) the journal name, which was the first column, far to the right (more columns not shown).

image

I see rowHeaders: true in the cvs.js file - is that needed? Does this/other params need to be a user option(s)?

FWIW: The attached file was derived from https://static-content.springer.com/esm/art%3A10.1057%2Feps.2013.8/MediaObjects/41304_2013_BFeps20138_MOESM1_ESM.xls which is public.

adam3smith commented 5 years ago

CC @anncie-pcss would be great if you could take a look.

annacieplicka commented 5 years ago

It is needed if you have file with header, so maybe it is a good idea to allow user to set this param?

qqmyers commented 5 years ago

That's probably needed in general, but, looking again, I'm not sure it's enough/fully solves the issue here.

The attached demo file has a header row and I think the primary issue is that it appears that papaparse is sorting the data by column name (with header=true) - putting columns with numeric labels first and in increasing order) and is not sorting the metadata itself (the list of column header labels), so the table is printing the column names in the same order as in the file but is showing the data out-of-order, e.g. as though the columns have been sorted alphabetically. In the example file, the data for columns "1"-"22" appear before the data for "Journal" which is the first column name. Handsontable is just using the ordering in the data array rather than treating it as key/value pairs.

I think this would affect any file with non-alphabetical headers (replace the first row at papaparse.com/demo with Column B,Column A,3,1 for example and check the data and metadata arrays in the console). It could be that there aren't many(any) real files with numeric headers, but, if there are, this would be worth fixing. I'm not sure if there's a setting in papapparse or handsontable to address this, but, if not, the local javascript may have to get the order correct in header=true mode.

qqmyers commented 5 years ago

The fix looks good. Thanks!