[ ] Travis CI
[ ] finished great test coverage
[ ] Extend Travis CI script: when new module released, it should work well with our projects3.0 automatically
[ ] check if we can use checkFilePath from generator
[ ] find out, what methods should be removed from utils.js, because they are duplicates from generator (partially done)
[ ] ESLint for projects2.0, projects3.0 folder. now it's disabled
[ ] Try Lerna or move out projects from this repository
[ ] Improve fileSystem file. we not using it actually, but maybe we can use it while we have a dataset and we trying to parse a whole dataset at one place...
Note: I didn't test them here(at separated place).
And i also think that projects should evolve in order to get able to use csv_parser
as separated entity correctly.
FoodComposition it is our first dataset that we actually parsed before, when this module was part of sd module repository codebase.
That code was working before. It can be an example of how we calling methods from src
folder.
When data was parsed. It calling methods from our another module - generator module
You can find how we execute this script at package.json
"csv:fc" - FoodComposition,
USFA is a second, separated dataset that we should parse
below is a list to script that executing parser for different CSV files that we have.
"csv:usfa1" - USFA/Derivation_Code_Description
"csv:usfa2" - USFA/Nutrition
"csv:usfa3" - USFA/Product
"csv:usfa4" - USFA/ServingSize
FAO is a third dataset. I think we didn't start to create a parse file for it.
Several quick start options are available:
git clone https://github.com/GroceriStar/food-datasets-csv-parser.git
npm install @groceristar/food-datasets-csv-parser
yarn add @groceristar/food-datasets-csv-parser
npm run parseCsv
or yarn parseCsv
: parse from csv to json Food CompositionTo split json file you will require sd/generator/writeFile.js
.
Call the function splitObject() with parameters path
(as string),filename
(as string) and a flag
(0 or 1).
Flag=0
means splitted elements are to be name after the name
attribute and if flag=1
then elements will be give named by a number with removed whitespaces and in lowercase to maintain uniformity.
The splitted elements will be stored at the given path
/filename_elements
.
Create a folder you want the generated json file(s) to be.
Also create a parser.js file in the created folder.
In csvParser.js call parseCsv()
with await
keyword because it's asynchronous function with
parameters ${__dirname}/${filename}
(the folder to read your csv file(s) from) as string,
then call csvToJson()
with parameters ${__dirname}/${filename}, data
data returned from parseCsv()
parseCsv()
require csv-Parser modulesasynchronous function that can parse csv files
/**
* parse csv files
* @async
* @param {string} path - The path of the csv file
* @param {opts} opts - optional options object for csv-parser package
* @returns {Promise<string[]>} Promise
*/
csvToJson( dirPath, data, split = false )
generate JSON file with the data provided
/**
* @async
* @param {dirPath} dirPath directory path
* @param {data} data
* @param {split} split split data to a serveral json files
* @returns {Promise<void>} Promise
*/
assign( fileInfo, dataEntries )
Total entries in csv file/1000 entries per json file => gets number of json files to be generated => store in fileCount
. For each file, calculate start/stop indexes based on max entries per file (1000). For the last file, the stop
index will be the length of dataEntries
- 1, Creates sliced array called jsonObjects
from dataEntries[start]
to dataEntries[stop]
. The current file number (i
), the fileName
, and jsonObjects
are passed to generateJsonFile
to make the file.
/**
*
* @param {Array<string>} fileInfo
* @param {Array} dataEntries
* @param {number} size
*/
generateJsonFile( fileInfo, data )
– requires writeFile from sd/generator to work.Writes sliced array data
to json file named fileName-${i}
/**
*
* @param {Array<string>} fileInfo
* @param {Array} data
*/
food-datasets-csv-parser/src
directory structure.
├── CCCSVParser.js
├── FoodComposition
│ ├── FoodComposition\ -\ Finland.json
│ ├── FoodComposition\ -\ France.json
│ ├── FoodComposition\ -\ Germany.json
│ ├── FoodComposition\ -\ Italy.json
│ ├── FoodComposition\ -\ Netherlands.json
│ ├── FoodComposition\ -\ Sweden.json
│ ├── FoodComposition\ -\ United\ Kingdom.json
│ ├── FoodComposition.json
│ ├── csv_parser.js
│ └── files.js
├── USFA
│ ├── Derivation_Code_Description
│ │ ├── Derivation_Code_Description1.json
│ │ └── parser.js
│ ├── Nutrition
│ │ ├── Nutrient01.json
│ │ ├── files.js
│ │ └── parser.js
│ ├── Product
│ │ ├── Products01.json
│ │ └── parser.js
│ ├── Readme.md
│ ├── Serving_Size
│ │ ├── Serving_Size1.json
│ │ └── parser.js
│ └── files.js
├── fileSystem.js
├── index.js
├── utils.js
└── writeFile.js
it should be a pretty similar work that we've made with FoodComposition data and USFA data as well. we just have a different dataset, with different headers and files, stored here: https://github.com/ChickenKyiv/awesome-food-db-strucutures/tree/master/FAO
logic is simple - it should have a similar structure as USFA has and similar parser files logic is simple - it should have a similar structure as USFA has and similar parser files
1st generation of parser scripts is related to Food composition and located at folder
example of 2nd gen parser script is here
Where should I write parser for FAO?
For now, use the same logic as we have at this repository,
i.e. at src
folder you can see now 3 folder that are our folders for storing data and parsers from different dataset.
It's our old logic of locating files. Later we'll move all projects our from src
folder.
I created projects3.0
- we'll move there our code later when it will work at least partially.
What we should do in order to create a parser, related to FAO dataset from scratch?
Keep in mind that part of these was actually completed
It looks like these .csv files have many headers. Whereas in the USFA version, you could easily hardcode the headers and pass them as the second argument to parseDirectoryFiles(), here I will need to dynamically obtain the headers from each file.
For this kind of problem we created a new method, that should be tested and used.
it's called getHeaders
and located here
We didn't battle-tested it. So if getHeaders
require changes - it's ok.