Food-Static-Data / food-datasets-csv-parser

csv parser that we're using for parcing few food datasets
0 stars 10 forks source link

FAO parser from scratch #27

Closed atherdon closed 5 years ago

atherdon commented 5 years ago

it should be a pretty similar work that we've made with FoodComposition data and USFA data as well. we just have a different dataset, with different headers and files, stored here: https://github.com/ChickenKyiv/awesome-food-db-strucutures/tree/master/FAO

atherdon commented 5 years ago

logic is simple - it should have a similar structure as USFA has and similar parser files logic is simple - it should have a similar structure as USFA has and similar parser files

atherdon commented 5 years ago

example of 2nd gen parser script is here https://github.com/GroceriStar/food-datasets-csv-parser/tree/master/src/USFA

atherdon commented 5 years ago

1st gen is Food composition

atherdon commented 5 years ago

I also add you to our organization. please follow https://github.com/GroceriStar in order to accept an invite

jacob-nooz commented 5 years ago

Where should I write this parser? In the root of https://github.com/ChickenKyiv/awesome-food-db-strucutures/tree/master/FAO ?

atherdon commented 5 years ago

nope. for now, use the same logic as we have at this repository. go to src/projects and you'll see there other parsers

jacob-nooz commented 5 years ago

I don't see a projects directory. Are you referring to parsers like this https://github.com/GroceriStar/food-datasets-csv-parser/blob/master/src/csvToJson.js ?

On Tue, Jul 16, 2019 at 12:53 PM Arthur Tkachenko notifications@github.com wrote:

nope. for now, use the same logic as we have at this repository. go to src/projects and you'll see there other parsers

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GroceriStar/food-datasets-csv-parser/issues/27?email_source=notifications&email_token=AH3USSVQY2OGG2UVMCENRTTP7YDK7A5CNFSM4ICUQLBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BURDA#issuecomment-511920268, or mute the thread https://github.com/notifications/unsubscribe-auth/AH3USSULRZ4PDFYYIIQ6RYTP7YDK7ANCNFSM4ICUQLBA .

atherdon commented 5 years ago

got it. check https://github.com/GroceriStar/food-datasets-csv-parser/tree/master/src/USFA for parsers

atherdon commented 5 years ago

and first gen parser is located here: https://github.com/GroceriStar/food-datasets-csv-parser/tree/master/src/FoodComposition but it's outdated

jacob-nooz commented 5 years ago

Sorry, but I'm new to working in this way. I could spend a lot of time figuring out exactly what you want me to do and where, so if you're not too busy, could you please clarify this for me? Exact links of where you want me to be working would be very helpful.

atherdon commented 5 years ago

1) create a folder with name FAO 2) copy any parser.js from USFA project 3) upload to that folder csv files from https://github.com/ChickenKyiv/awesome-food-db-strucutures/tree/master/FAO 4) for each "table" - each file with exported data are actually a table from database. For each "table" you should create separated folders, like we did in USFA case. 5) for each "table" you should have a separated file parser.js that will be a script that we call for parsing each table 6) you can add a line at package.json so you'll be able to call your script(right now it wouldn't work, but soon, we'll finish other changes that will help to make this parsers work)

@AndrewSerra maybe you should jump into this task as well - but please together with Jan please decide what each of you will do

@jan-strum tell me if something is not clear

jacob-nooz commented 5 years ago

Thank you, @atherdon. I'll get on this now.

@AndrewSerra I could do what @atherdon just described and tag you for review. I'm also open to working however else you see fit.

jacob-nooz commented 5 years ago

@atherdon It looks like these .csv files have many headers. Whereas in the USFA version, you could easily hardcode the headers and pass them as the second argument to parseDirectoryFiles(), here I will need to dynamically obtain the headers from each file.

I know that you want this script to be from scratch, but I noticed that you're using the csv-parser module. Could I use this or another module to dynamically obtain the headers from each .csv file?

atherdon commented 5 years ago

It looks like these .csv files have many headers.

yeah, at FoodComposition with a lot of headers as well.

here I will need to dynamically obtain the headers from each file

it's a very smart way - i like it. you can create additional method that we also can use for other parsers as well!

to be from scratch

Here I mean that I don't give you any small tasks around parsers, so you get used to it. Code should looks similar to parsers that we have at USFA. And please use csv-parser module, because there no needs to invent the wheel. I personally hate to code something from 0 when i know that i can save time and do a better code with modules from npm. this is why we always trying different things and see what will work for our case.

Yep, use csv-parser for headers. Hope it will also work for other datasets as well. because we have a spaghetti code at FoodComposition project

https://www.youtube.com/watch?v=SW-BU6keEUw

AndrewSerra commented 5 years ago

@jan-strum so which part are you working on? How could I help with the process?

jacob-nooz commented 5 years ago

@AndrewSerra Thanks for offering your help. I reorganized the file structure, and I was about to find a way to dynamically obtain headers from each file.

After that, assuming the parser works for this dataset, there shouldn't be much to do. I'd love it if you could review my code when it's ready, or help me debug if I get stuck along the way, but for now, I think I'm okay to work on this on my own.

AndrewSerra commented 5 years ago

Sure tag me when you are ready and I can check the PR. And whenever you need something just tag me to a message here :)

jacob-nooz commented 5 years ago

@atherdon @AndrewSerra Is there anything I need to do to configure ESLint and Prettier for this project? After running npm install I kept getting Failed to load config errors for whatever is in my root .eslintrc.json file. I changed that file to just extend prettier, but now I'm getting errors like "const is a reserved keyword."

atherdon commented 5 years ago

you can fix it, but it not our main point for this task. you can create a separated task, explain details and work on it later

atherdon commented 5 years ago

i think it's your global ESLint maybe throwing a warnings. at this repo we don't have raw ESLint. we using Standard - which is just a simple eslint config

atherdon commented 5 years ago

and usually - show me a code via PR, because i don't have compilator in my mind ;)

jacob-nooz commented 5 years ago

@atherdon Of course. Thank you.

@AndrewSerra I've run into something you may be able to help me with:

I wrote a helper function to dynamically obtain headers, and it works, but I'm having trouble getting that helper to return those headers so that I can use them elsewhere.

While I can log the headers successfully on line 14, when I return them and log the return value elsewhere, it logs undefined. This is because, in getHeaders on line 17, headers is returned before being assigned its value.

I'm not sure what the workaround for this is, but I'm looking for solutions. Any suggestions you have would be greatly appreciated.

Thanks!

atherdon commented 5 years ago

@jan-strum please push that changes you've mentioned into PR

atherdon commented 5 years ago

Pull requests is a better place to discuss things like we have here

jacob-nooz commented 5 years ago

@atherdon @AndrewSerra I've just created a PR for these changes.

atherdon commented 5 years ago

1) curious to see if we can use this method(getHeaders) for FoodComposition 2) for now it's ok to have a file with ES5 style when we debugging it, but later we'll update it to ES6. we need to keep note on it. 3) we can go and create copy-paste for parsers, but maybe we need to update some methods, that will make parser.js less duplicated. want to hear thoughts about it

atherdon commented 5 years ago

Please make some small changes and open another PR so we can focus on problem that you describe at https://github.com/GroceriStar/food-datasets-csv-parser/issues/27#issuecomment-512308018

jacob-nooz commented 5 years ago
  1. curious to see if we can use this method(getHeaders) for FoodComposition

Definitely.

  1. for now it's ok to have a file with ES5 style when we debugging it, but later we'll update it to ES6. we need to keep note on it.

Are you referring to imports/exports? If so, yes, I had some issues using ES6 in this codebase.

  1. we can go and create copy-paste for parsers, but maybe we need to update some methods, that will make parser.js less duplicated. want to hear thoughts about it

I was thinking about how much duplication this will involve. If you're going to create a script to parse each csv file, perhaps that script can take an argument, which is the path to each file. This way, we could just have one parser.js in the root of FOA. I'll have to look into how to write a script that takes an argument, though. What do you think?

jacob-nooz commented 5 years ago

Please make some small changes and open another PR so we can focus on problem that you describe at #27 (comment)

This problem has been resolved by my last PR.

atherdon commented 5 years ago

Are you referring to imports/exports? If so, yes, I had some issues using ES6 in this codebase.

We'll deal with this later. It's just unfinished process for now.

I'll have to look into how to write a script that takes an argument, though.

Please remember that we're working together here. So please share some work with Andrew, because with this approach we can move forward more quickly. I prefer this way: 1) it looks similar to USFA, but each time you find something, worth improving, you file a new issue at this repository where we can chat together about how to do it.

2) we make most pressing work done and then slowly upgrade our code

Because if we'll start to do anything at the same time - it will only give us troubles and confusion. So please do smaller steps, more frequent commits and communication

atherdon commented 5 years ago

And as we have FAO database, please rename a foldername

jacob-nooz commented 5 years ago

And as we have FAO database, please rename a foldername

Rename what folder to what?

atherdon commented 5 years ago

@jan-strum do you need help with this parser task? feel free and ask for help - just want to move this part forward

jacob-nooz commented 5 years ago

@atherdon I'm going to be pretty busy for the indefinite future, but I'd like to get back to committing code to GroceriStar as soon as I can. Please assign this task to someone else in the meantime, and I will let you know of my next availability.

atherdon commented 5 years ago

got it - thanks for letting me know, I think we'll handle it with Andrew