Setup and scrape tasks which define a set of scrapers and relevant scraping options

esakal commented 6 years ago

Overview

Motivation

This PR was created following #28.

A task structure

A task defines a set of scrapers and relevant scraping options. A user can define multiple tasks that serve different budgets (for example separated work and home budget).

An example for a real task can be:

{
  "scrapers": [
    {
      "id": "leumi",
      "credentials": {
        "username": "",
        "password": ""
      }
    },
    {
      "id": "leumiCard",
      "credentials": {
        "username": "",
        "password": ""
      }
    }
  ],
  "options": {
    "combineInstallments": false,
    "dateDiffByMonth": 3
  },
  "output": {
    "saveLocation": "/Users/eransakal/Downloads/Transactions",
    "combineReport": true
  }
}

Setup a task

A user can manage tasks using the existing setup command npm run setup. I was working on the setup process to be as friendly as possible.

You can see the task setup menu structure here.

Changes in the repository

New dependencies

colors library was added to allow coloring feedback provided to the user while he/she setup a task. It is enriching his user experience.

Linting adjustments

Ignoring no-underscore-dangle: Since the setup process is quite complex, I split the code into multiple classes so it would be easier to read and maintain the code. The out-of-the-box rules of airbnb prevent using the convention _ to indicate a method/property should not be used as public api. I don't want to argue whether it is good or not (airbnb/javascript#1089 and airbnb/javascript#1024) but until ECMA classes will provide a decent support for private methods/properties I do think this is better then not distinguishing between private and public api.
Ignoring class-methods-use-this: This one is quite annoying because it prevent you from having a public api of a class which doesn't use other inner properties/methods of the class. It forces you setting that function as static which is very intrusive and limiting the programmer freewill.

Splitting individual scrape code

A task is basically a group of individual scrapers with predefined scraping options. So basically we want to run the individual scraper multiple times when executing a task. To keep DRY principle I needed to split the original scrape-individual.js file into two files. The first file handles the actual scraping and is used both by the individual scraper and the task scraper. The second file is the entry point when the user wants to scrape individual scraper and is used to get the relevant parameters from the user.

Extracting the report generation

A task scraper support two types of output, a single report that combine all the scrapped accounts together and for backward compatible it also allow creating multiple reports, one per each account. For the same reason as explained above to keep the DRY principle I extracted the report generation into a separated file.

That's it, all the remaining files are there mostly to setup and scrape a task.

@eshaham Please review. Thanks!

Closes #28

This change is

esakal commented 6 years ago

@eshaham a spoiler alert, this is going to be a bit long comment, I tried to cover everything. I will work on the inline comments soon.

I must say I can’t wait to be able to start using what you implemented here 👍

Thanks! it feels very powerful to scrape all the accounts at the same time.

The underscore notation of private fields makes me tick… 😄 . I would rather see functions extracted outside of the class if they are not to be shared publicly. At some point I gave up and stopped adding comments about it, but I would love to see it changed.

When I created the PR I was really hoping that you and I are on the same side of this debate :) I don't understand how private properties/methods were left out of the specification. It is be very hard for me to remove the _ prefix, mostly because not having a way to let other developers distinguish between private and public api is like not having an api at all. I can live with moving private functions from the class even if it breaks the class isolation/encapsulation and makes the code less readable, but not having private properties is... well... But I respect your thought and opinion and will make the relevant adjustments to ditch the _ prefix.

This was a massive PR, and I really had a hard time reviewing all of it at once. I would prefer seeing smaller incremental PRs, but anyway… Given this comment, I never tend to re-review PRs, but in this case I reserve the right to review again 😄

We are on the same page, having smaller incremental PRs are easier to follow. This is the reason I created #29 and #30 to begin with. But I don't think there is an intuitive way to split the rest of the code. This PR deals with managing & scraping tasks and splitting those two tasks is irrelevant. As a side note, I would like to recommend an awesome reviewing tool that I integrated into my company development process which was essential to the success of our project and is called reviewable.io. it is marked as beta but I think it is like that just because they forgot to remove the label. I love this tool and I think it has everything you dream for in reviewing service. Obviously you don't need to use it with the project unless you want to give it a try, but I owe it to the developers of this service to talk about it whenever I can.

Would be great to see some reflection of this in the README file

Sure, will do

@esakal you might want to close this issue using one of the relevant keywords in the PR description.

Done

Thanks so much for all the effort you put into this!

It is the minimum I can do, I really appreciate your work on those libraries and your contribution to the open banking concept in Israel.

eshaham commented 6 years ago

As a side note, I would like to recommend an awesome reviewing tool that I integrated into my company development process which was essential to the success of our project and is called reviewable.io

Could you TL;DR about what's so great about reviewable.io? Took a peek at their website, but their explanation isn't that clear. I'm not sure what I would gain from their tool, which doesn't already exist on GitHub.

esakal commented 6 years ago

Could you TL;DR about what's so great about reviewable.io?

I think the following article layout nicely the reasons why reviewable.io transcends github reviewing support. I think it sums up to whether you believe that code reviews are essential part of the development process, if you do then once you will start using it you will not look back.

esakal commented 6 years ago

@eshaham I pushed some improvements/fixes to this pr:

the task start calculation was wrong. I reduced the value by one so scraping was off by a month.
the single report csv file included all properties of transactions due to invalid usage of the json2csv api.
I improved the code in generate-report.js file as you recommended.
I added an option to ignore/exclude future transactions as part of a task.

I know this PR is already big so I try to to a few adjustments as possible. Hopefully you are experimenting reviewable.io with this branch and if this is the case so it should be much simpler to follow the adjustments.

Have a great weekend

esakal commented 6 years ago

@eshaham if you still find it difficult to review, I can split this one into two separated branches/prs:

the code that deals with scraping a task.
the code that deals with setup a task.

There are some complications with this separation:

You will lose the connection to the comments you already wrote
You will have a blocking dependency between them because there is no point adding one without the other.
There are some shared additions/changes that will be duplicated between them and will cause merge conflicts.

I know it is hard to follow big reviews when using github PR dashboard so if you think it worth the effort we can do it and tackle the complications as we go.

Please lmk if you want to continue with the PR or with the suggested alternative

eshaham commented 6 years ago

@esakal let's keep this PR, as we've already made some progress with it. I've looked at reviewable.io, but didn't like the fact that mixing reviews in GH and reviewable will cause line comments to become top level comments, see here. I think I will stick to GH for now. Will go over your comments and changes and provide feedback soon.

esakal commented 6 years ago

thanks, I will wait for your feedback. Regarding reviewable.io, I also almost ruled out reviewable. But since my work project rely heavily on reviews we learned over time that once you start working with reviewable you don't feel the need to use github dashboard anymore. This is my personal experience but I understand your concerns about the comments becoming top level comments instead of inline comments.

esakal commented 6 years ago

@eshaham I handled all the comments we currently have in this PR.

I changed file tasks-manager.js into tasks.js and separated the summary creation from the task manager. I also exported the class instead of a singleton.
I'm sure you are familiar with this option but just in case, since we already have 17 commits, if you want you can always squash them into one single commit.

Night!

eshaham commented 6 years ago

@esakal I don't believe in squashing commits, I like to see the evolution of the PR in the commit list.

eshaham / israeli-ynab-updater