CCI-MOC / process_csv_report

Some scripts to help process our billing reports
0 stars 3 forks source link

Refactor billing by implementing a Processor class #95

Open QuanMPhm opened 2 weeks ago

QuanMPhm commented 2 weeks ago

Based on previous discussion about billing and applying discounts, we agreed that the Invoice class should be refactored by completely separating the filtering/exporting and the processing functionality.

With the current code, the "Balance" field has a different meaning in different invoices, depending on which discount has been applied. The balance field may represent the original cost, or the original cost minus a certain combination of discounts. Additionally, because of the specifications of institute-specific discounts, such as the BU subsidy, a distinction between what the PI pays (PI Balance), and what their host institute pays (i.e BU Balance), should be made.

With the current implementation of the Invoice class, invoice processing cannot be made independent of exporting. This currently blocks of us from easily implementing the prepayment system and future billing systems.

The main features of the refactoring proposal are:

QuanMPhm commented 1 week ago

@knikolla @naved001 It was agreed that the processors should only process the data, and the invoice objects should only perform basic filtering and export them without changing data, but this leads to a problem with the BU Subsidy invoice.

For the BU Internal invoice, we want each invoice row to represent the costs for a project, not allocation. This means, for projects with multiple allocations (which will then have multiple rows in the raw invoices), they need to be summed up before the BU Internal invoice is exported.

My proposed solution to this would be that the BUSubsidyProcessor applies the subsidy, but does not squash the rows. The BUInternalInvoice invoice class will instead perform the squashing before exporting the invoice. I can see that this violates our desire that the invoice classes should not change data in anyway.

What are your opinions on this?

QuanMPhm commented 1 week ago

@knikolla @naved001 Another question I'd like to ask.

For the step of adding institution names and removing non-billable projects, which I view as non-discount-related processing, can these be placed into a Processor subclass?

knikolla commented 1 week ago

@knikolla @naved001 Another question I'd like to ask.

For the step of adding institution names and removing non-billable projects, which I view as non-discount-related processing, can these be placed into a Processor subclass?

Yes

QuanMPhm commented 1 week ago

Each invoice class must declare a list of columns they are exporting. Every Invoice object that call export() (the exporting invoices) must never change the dataframe itself.

@knikolla Should a test case be added for each invoice class to check that the correct columns and columns names are being exported? I thought about it a little bit and figured that if each invoice class is explicitly declaring a dict to show the exported columns, it doesn't make sense to test for something that doesn't require any processing, the function for filtering columns would be 2-3 lines.

knikolla commented 1 week ago

@knikolla @naved001 It was agreed that the processors should only process the data, and the invoice objects should only perform basic filtering and export them without changing data, but this leads to a problem with the BU Subsidy invoice.

For the BU Internal invoice, we want each invoice row to represent the costs for a project, not allocation. This means, for projects with multiple allocations (which will then have multiple rows in the raw invoices), they need to be summed up before the BU Internal invoice is exported.

My proposed solution to this would be that the BUSubsidyProcessor applies the subsidy, but does not squash the rows. The BUInternalInvoice invoice class will instead perform the squashing before exporting the invoice. I can see that this violates our desire that the invoice classes should not change data in anyway.

What are your opinions on this?

It's fine because here the Invoice class is not changing the data but just the representation before output (squashing).

knikolla commented 1 week ago

Each invoice class must declare a list of columns they are exporting. Every Invoice object that call export() (the exporting invoices) must never change the dataframe itself.

@knikolla Should a test case be added for each invoice class to check that the correct columns and columns names are being exported? I thought about it a little bit and figured that if each invoice class is explicitly declaring a dict to show the exported columns, it doesn't make sense to test for something that doesn't require any processing, the function for filtering columns would be 2-3 lines.

If you provide some kind of abstraction in the base Invoice class, you'll only need to test the implementation of the base invoice.