Refactor billing by implementing a Processor class

QuanMPhm commented 2 weeks ago

Based on previous discussion about billing and applying discounts, we agreed that the Invoice class should be refactored by completely separating the filtering/exporting and the processing functionality.

With the current code, the "Balance" field has a different meaning in different invoices, depending on which discount has been applied. The balance field may represent the original cost, or the original cost minus a certain combination of discounts. Additionally, because of the specifications of institute-specific discounts, such as the BU subsidy, a distinction between what the PI pays (PI Balance), and what their host institute pays (i.e BU Balance), should be made.

With the current implementation of the Invoice class, invoice processing cannot be made independent of exporting. This currently blocks of us from easily implementing the prepayment system and future billing systems.

The main features of the refactoring proposal are:

Implementation of a Processor class, which subclasses from Invoice, containing only the logic to process, or manipulate invoice data. This class would be used to implement our current and future discounts.
Split out the new PI credit logic into a subclass of Processor, NewPICreditProcessor. Rename the DiscountInvoice base class to DiscountProcessor and have it subclass Processor.
Split out the BU subsidy processing method from the BU invoice class and implement it as a subclass of DiscountProcessor, BUSubsidyProcessor.
In the implementation of DiscountProcessor introduce two new class variables called is_discount_by_nerc and discount_value_column.
If discount_by_nerc is False, only User Balance is decreased and not Balance. As this discount is not by the NERC, the amount needs to be owed by someone. Therefore we decrease PI Balance, but not Balance. The amount that was discounted is stored in a new column, to keep track of who owes it and by how much. This is the case for the BU Subsidy, which is a discount by BU, not by the NERC. If discount_by_nerc is True, this discount is offered by the NERC the total balance decreases alongside the PI Balance. This is the case for New PI Credits, which are offered by the NERC itself.
Both NewPICreditProcessor and BUSubsidyProcessor are applied before exporting the NERC invoice and all successive invoices.
Each invoice class must declare a list of columns they are exporting. Every Invoice object that call export() (the exporting invoices) must never change the dataframe itself.

QuanMPhm commented 1 week ago

@knikolla @naved001 It was agreed that the processors should only process the data, and the invoice objects should only perform basic filtering and export them without changing data, but this leads to a problem with the BU Subsidy invoice.

For the BU Internal invoice, we want each invoice row to represent the costs for a project, not allocation. This means, for projects with multiple allocations (which will then have multiple rows in the raw invoices), they need to be summed up before the BU Internal invoice is exported.

My proposed solution to this would be that the BUSubsidyProcessor applies the subsidy, but does not squash the rows. The BUInternalInvoice invoice class will instead perform the squashing before exporting the invoice. I can see that this violates our desire that the invoice classes should not change data in anyway.

What are your opinions on this?

QuanMPhm commented 1 week ago

@knikolla @naved001 Another question I'd like to ask.

For the step of adding institution names and removing non-billable projects, which I view as non-discount-related processing, can these be placed into a Processor subclass?

knikolla commented 1 week ago

@knikolla @naved001 Another question I'd like to ask.

For the step of adding institution names and removing non-billable projects, which I view as non-discount-related processing, can these be placed into a Processor subclass?

Yes

QuanMPhm commented 1 week ago

Each invoice class must declare a list of columns they are exporting. Every Invoice object that call export() (the exporting invoices) must never change the dataframe itself.

@knikolla Should a test case be added for each invoice class to check that the correct columns and columns names are being exported? I thought about it a little bit and figured that if each invoice class is explicitly declaring a dict to show the exported columns, it doesn't make sense to test for something that doesn't require any processing, the function for filtering columns would be 2-3 lines.

knikolla commented 1 week ago

@knikolla @naved001 It was agreed that the processors should only process the data, and the invoice objects should only perform basic filtering and export them without changing data, but this leads to a problem with the BU Subsidy invoice.

For the BU Internal invoice, we want each invoice row to represent the costs for a project, not allocation. This means, for projects with multiple allocations (which will then have multiple rows in the raw invoices), they need to be summed up before the BU Internal invoice is exported.

My proposed solution to this would be that the BUSubsidyProcessor applies the subsidy, but does not squash the rows. The BUInternalInvoice invoice class will instead perform the squashing before exporting the invoice. I can see that this violates our desire that the invoice classes should not change data in anyway.

What are your opinions on this?

It's fine because here the Invoice class is not changing the data but just the representation before output (squashing).

knikolla commented 1 week ago

Each invoice class must declare a list of columns they are exporting. Every Invoice object that call export() (the exporting invoices) must never change the dataframe itself.

@knikolla Should a test case be added for each invoice class to check that the correct columns and columns names are being exported? I thought about it a little bit and figured that if each invoice class is explicitly declaring a dict to show the exported columns, it doesn't make sense to test for something that doesn't require any processing, the function for filtering columns would be 2-3 lines.

If you provide some kind of abstraction in the base Invoice class, you'll only need to test the implementation of the base invoice.

CCI-MOC / process_csv_report

Refactor billing by implementing a Processor class #95