Battery-Intelligence-Lab / galv

An open-source platform for automated storage of battery data with advanced metadata support
https://battery-intelligence-lab.github.io/galv/
Other
33 stars 8 forks source link

Import .CSV files with custom cycler definition metadata file #97

Open BradyPlanden opened 1 year ago

BradyPlanden commented 1 year ago

Is your feature request related to a problem? Please describe. Currently, importing .CSV files from non-supported cyclers isn't supported.

Describe the solution you'd like A method to import .CSV files with a corresponding JSON file that defines a custom cycler data standard. For example, a user with an Arbin exported .CSV file could provide a JSON file that defines the header names for import into Galv. This could also be used to import virtual data gained from predictive models, as long as the metadata file had the corresponding information.

Additional context To integrate with #95, if the JSON exported (as per #95) contained the required information to reimport into Galv that would enable users to share between Galv instances. Perhaps, there might be a better method for this though.

BradyPlanden commented 1 year ago

If needed, I have Arbin .CSV files that can be used for testing.

martinjrobins commented 1 year ago

On the parser side, this would involve implementing a new parser that uses the JSON file to map columns in the CSV to our standard columns.

The (perhaps bigger) piece of work would be to allow users to supply this JSON file. Perhaps this would be a field in the harvester @mjaquiery? We'd also need to determin the format of this JSON, but seems like it would just be a dictionary that maps column names to our standard column names. In this case, we'd have to tell the user that they need to provide a csv with the 1st row being header names

mjaquiery commented 1 year ago

I'd suggest we have two header rows, one with column names and one with data type. Is it easier for end users to provide csv files with a particular structure, or write mapping files? I guess the latter is more shareable between users. Another alternative is that we hack file extensions, so e.g. the Arbin files get converted from .csv to e.g. .arb, and we teach a harvester how to interpret those files itself...?

BradyPlanden commented 1 year ago

Hmm, can we infer the data-type from the data itself? We could then compare the data-type to an approved list or single value and throw an error if mismatch. I think it's probable that asking an average user to define data-type is too much.

I think it's easiest for end users to provide CSV files with a corresponding cycler definition structure. This could either be in JSON format (we could provide some example for users to copy).

mjaquiery commented 1 year ago

Perhaps a better alternative is to maintain internal structures ourselves, and let users upload .csv and select where it comes from from a list? We can allow advanced users to create new mappings (in JSON or something) where they provide an example .csv file and add metadata for the columns.

My concern is that data types aren't always simply parsable from data: datetime strings are difficult to automatically recognise, for example, and sometimes datasets use string values (e.g. "NA") to represent missing numerical data. Maybe I'm not quite seeing where these data are coming from and what they look like in their original form.

We might benefit from a real-time discussion about this.