KIKIRPA / Entropy

Analytical data repository for Cultural Heritage laboratories
GNU General Public License v3.0
1 stars 0 forks source link

implement loose comparison of (sub)field names during import #50

Open kobbejager opened 6 years ago

kobbejager commented 6 years ago

When uploading metadata in the form of a metadata CSV file, it is possible to create fields with slightly names, e.g. "sample:CI number", "sample:cinumber", "sample:C.I. number". During the import process these are considered different fields. No error checking is performed against this and all variations will be kept. In case of "samplesource:0:sample identifier" and "sample source:0:sample source", data that should belong under the same top level field, will be stored in different trees.

In later stages, when creating the measurement.json file, viewing or downloading/exporting the (meta)data, the getMeta() function will perform loose field comparison, using sanitizeStr($field, "", "-+:^", 1).

The same loose comparison should be implemented during importing data. Probably the most suitable spot to implement this, in the inflateArray() function. However, (sub)field names should not be stored in a simplified manner: "cinumber" has lost all readability and formatting information; "C.I. number" can be reused as such in the view and other modules. Fields with similar names should thus be merged, but (one of the) original names should be kept.

Remark: inflateArray() currently has no error checking and autocorrection reporting. This too needs to be implemented at the same moment.

Remark: Many changes will be required to the import module. It is probably advisable to work on this issue in conjunction with the import module rewrite discussed in issue #18.