Food-Static-Data / food-datasets-csv-parser

csv parser that we're using for parcing few food datasets
0 stars 10 forks source link
csv-export csv-parser food-technology javascript json-export rollup-js

food-datasets-csv-parser

Maintainability Test Coverage Build Status

Stretch goals

scripts for testing few of our parsers - old, very old and new one

Note: I didn't test them here(at separated place). And i also think that projects should evolve in order to get able to use csv_parser as separated entity correctly.

FoodComposition it is our first dataset that we actually parsed before, when this module was part of sd module repository codebase. That code was working before. It can be an example of how we calling methods from src folder. When data was parsed. It calling methods from our another module - generator module

You can find how we execute this script at package.json

"csv:fc"    - FoodComposition,

USFA is a second, separated dataset that we should parse

below is a list to script that executing parser for different CSV files that we have.

note:rename USFA to USDA

"csv:usfa1" - USFA/Derivation_Code_Description
"csv:usfa2" - USFA/Nutrition
"csv:usfa3" - USFA/Product
"csv:usfa4" - USFA/ServingSize

FAO is a third dataset. I think we didn't start to create a parse file for it.


Quick Start

Several quick start options are available:

Parser commands

How to split json into single elements

To split json file you will require sd/generator/writeFile.js . Call the function splitObject() with parameters path(as string),filename(as string) and a flag(0 or 1). Flag=0 means splitted elements are to be name after the name attribute and if flag=1 then elements will be give named by a number with removed whitespaces and in lowercase to maintain uniformity. The splitted elements will be stored at the given path/filename_elements.

How to parse csv File(s) from a folder to to json file(s)

Create a folder you want the generated json file(s) to be. Also create a parser.js file in the created folder. In csvParser.js call parseCsv() with await keyword because it's asynchronous function with parameters ${__dirname}/${filename} (the folder to read your csv file(s) from) as string, then call csvToJson() with parameters ${__dirname}/${filename}, data data returned from parseCsv()

parseCsv() require csv-Parser modules

asynchronous function that can parse csv files

/**
 * parse csv files
 * @async
 * @param {string} path - The path of the csv file
 * @param {opts} opts - optional options object for csv-parser package
 * @returns {Promise<string[]>} Promise
 */

csvToJson( dirPath, data, split = false )

generate JSON file with the data provided

/**
 * @async
 * @param {dirPath} dirPath directory path
 * @param {data} data
 * @param {split} split split data to a serveral json files
 * @returns {Promise<void>} Promise
 */

assign( fileInfo, dataEntries )

Total entries in csv file/1000 entries per json file => gets number of json files to be generated => store in fileCount. For each file, calculate start/stop indexes based on max entries per file (1000). For the last file, the stop index will be the length of dataEntries - 1, Creates sliced array called jsonObjects from dataEntries[start] to dataEntries[stop]. The current file number (i), the fileName, and jsonObjects are passed to generateJsonFile to make the file.

/**
 *
 * @param {Array<string>} fileInfo
 * @param {Array} dataEntries
 * @param {number} size
 */

generateJsonFile( fileInfo, data )– requires writeFile from sd/generator to work.

Writes sliced array data to json file named fileName-${i}

/**
 *
 * @param {Array<string>} fileInfo
 * @param {Array} data
 */

ES5 and ES6 simple differences reference

  1. https://engineering.carsguide.com.au/es5-vs-es6-syntax-6c8350fa6998

food-datasets-csv-parser/src directory structure

.
├── CCCSVParser.js
├── FoodComposition
│   ├── FoodComposition\ -\ Finland.json
│   ├── FoodComposition\ -\ France.json
│   ├── FoodComposition\ -\ Germany.json
│   ├── FoodComposition\ -\ Italy.json
│   ├── FoodComposition\ -\ Netherlands.json
│   ├── FoodComposition\ -\ Sweden.json
│   ├── FoodComposition\ -\ United\ Kingdom.json
│   ├── FoodComposition.json
│   ├── csv_parser.js
│   └── files.js
├── USFA
│   ├── Derivation_Code_Description
│   │   ├── Derivation_Code_Description1.json
│   │   └── parser.js
│   ├── Nutrition
│   │   ├── Nutrient01.json
│   │   ├── files.js
│   │   └── parser.js
│   ├── Product
│   │   ├── Products01.json
│   │   └── parser.js
│   ├── Readme.md
│   ├── Serving_Size
│   │   ├── Serving_Size1.json
│   │   └── parser.js
│   └── files.js
├── fileSystem.js
├── index.js
├── utils.js
└── writeFile.js

Methods from this module

generate

WriteInFile

assign

csvToJson

parseDirectoryFiles

parseFoodComposition

makeReadable

writeFile

fixPath

readData

saveFile

makeFolder

combineObject

splitObject

How to create parser for FAO dataset from scratch

it should be a pretty similar work that we've made with FoodComposition data and USFA data as well. we just have a different dataset, with different headers and files, stored here: https://github.com/ChickenKyiv/awesome-food-db-strucutures/tree/master/FAO

logic is simple - it should have a similar structure as USFA has and similar parser files logic is simple - it should have a similar structure as USFA has and similar parser files

1st generation of parser scripts is related to Food composition and located at folder

example of 2nd gen parser script is here

Where should I write parser for FAO?

For now, use the same logic as we have at this repository, i.e. at src folder you can see now 3 folder that are our folders for storing data and parsers from different dataset. It's our old logic of locating files. Later we'll move all projects our from src folder. I created projects3.0 - we'll move there our code later when it will work at least partially.

What we should do in order to create a parser, related to FAO dataset from scratch?

Keep in mind that part of these was actually completed

It looks like these .csv files have many headers. Whereas in the USFA version, you could easily hardcode the headers and pass them as the second argument to parseDirectoryFiles(), here I will need to dynamically obtain the headers from each file.

For this kind of problem we created a new method, that should be tested and used. it's called getHeaders and located here We didn't battle-tested it. So if getHeaders require changes - it's ok.