PaulMcInnis / JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.
MIT License
1.88k stars 217 forks source link

[Feature Request] Export as JSON #86

Closed Zenahr closed 4 years ago

Zenahr commented 4 years ago

I guess at some points dicts get converted to .csv so maybe we could have a --json flag which exports the jobs in JSON format instead of CSV.

thebigG commented 4 years ago

This would be a pretty neat feature. But right now, as you can probably tell from #85, we are re-structuring a lot of stuff right now. So I suspect we won't be able to even start to consider this until that re-structuring is done. I'm also not 100% if we should implement a feature like this given that, as far as I can tell, is not an essential feature of JobFunnel to do this. But I could be wrong. Maybe other people can pitch in and see what they think? In the meantime, maybe something like https://github.com/Keyang/node-csvtojson could help in whatever it is that you are trying to do.

PaulMcInnis commented 4 years ago

@Zenahr I like this feature, pretty easy to add as well.

One thing is that JSON isn't very user-interactive and the JobFunnel tool is intended to be used interactively and not as a pure scraping tool.

After #85 goes in it will also be very straightforwards to use JobFunnel via API, where all the data is already in memory once scraped, and could be used for other purposes.

Perhaps for now we should stick to the CSV, but I think we should always make sure the data is easily exportable elsewhere.

Zenahr commented 4 years ago

I feel like this should be kept open for now since it's open for discussion or is this not the case?

@thebigG sure it's not essential but I proposed an option for the user to get the data out as JSON by appending a flag --json to JobFunnel. I don't see why this wouldn't be good since JSON is a portable way of storing data and it doesn't get in the way of the existing functionality. It's just a small add-on.

@PaulMcInnis I was suggesting an export function, not to model JobFunnel into something it isn't intended to be. How does adding the option of exporting the data as JSON make JobFunnel less "interactive"?

PaulMcInnis commented 4 years ago

opening for discussion.

RE: interactivity: The intended use for this tool is as a job search aid, not as a raw job scraping tool, so the output format should be user-editable, which JSON is not intended to be.

It's a small addon, I agree.

One condition I have of adding this feature is that the CLI does not allow reading from the JSON, as we would then be storing data in multiple places at once (i.e. you'd export to JSON and it would create the JSON at the same time that it creates/updates the CSV).

Zenahr commented 4 years ago

@PaulMcInnis I agree on it strictly being an option for output, not input.

thebigG commented 4 years ago

Another thing that I think I wasn't clear enough on was: Yes, this is a small add-on. But, it is an additional feature nonetheless. In the future, JobFunnel WILL get far more complex. I'm referring to adding more scrapers. And that is assuming that whatever scrapers we add in the future play "nice". Just look at the code for GlassDoor and you'll get what I mean. I suspect one of the main reasons, or probably the main reason, that Paul is re-structuring so much in #85 is because of all the issues we had in the past; Internationalization, Dynamic vs Static Scrapers(GlassDoor isues); lack of encapsulation, etc. Now this JSON feature will mean more tests to write; will make JobFunnel just a tad bit more painful to maintain. Trust me, it would be a cool feature to have. But I really think we should think about how much more cost this would add to our maintenance. Is it worth adding this feature if one could very easily send the CSV to an external tool to output it as JSON? Is the feature worth the cost of making the CLI parser(which is already somewhat complex) even more complex?

I think if the community REALLY wants the JSON export option, then cool, I think we should add it!

But I think we should really evaluate whether adding this feature, whose job can be accomplished using an external JSON tool, is worth the cost.

I hope all that makes sense :smiley:

PaulMcInnis commented 4 years ago

Actually one more alternative is to simply read in the pickle files we create, as these are python Dict already. this is pre-filtering for duplicates and some other things however.

PaulMcInnis commented 4 years ago

closing this for now, since I think for now it is best to use cache files or CSV file, If you are reading this and feel strongly it should be included feel free to comment below RE: why we should impl. this and I can re-open.

Ultimately I think the best solution for JobFunnel is one where we are using a DB with a simple reviewing GUI, but this may detract from the ease of use a CSV provides.