CCI-MOC / process_csv_report

Some scripts to help process our billing reports
0 stars 3 forks source link

Automate More of the Invoice post processing steps #6

Closed joachimweyl closed 6 months ago

joachimweyl commented 9 months ago

Motivation

After the OpenShift, OpenStack, and storage invoice data is pulled it is then combined together. Then the next step is that it is processed in a few ways in Google Sheets. We would like to automate this after processing as much of the last step as we can easily. We are not trying to automate all of the steps, just the steps leading up to what we send to MGHPCC. The other steps such as updating PDF's, Rollups, Orran course charges, and Utilized billed and non billed can wait as @joachimweyl has automated them mostly in Google Sheets and they are not on the same time crunch.

Completion Criteria

Provide details of the steps of the Invoice after-process

Description

  1. Add 3 new columns, Credit, Credit Code, and Balance to billable CSV

    1. Add Credits for new PI
      1. Compare a list of all PIs that have used OpenStack and OpenShift in the past, find out if there are any new PIs this month, and if there are add the current Cost value (up to $1k across the whole project) to the Credit column of the billable CSV
      2. this means that if there are 4 projects and they sum up to 1.5k we only add up to 1k in the Credit column and the rest will be invoiced directly
    2. Add 0002 to the Credit code field for all rows that are given New PI Credits
    3. Set the Balance column for all rows to be Cost minus Credit
      1. this means for non-credit rows balance will equal cost and for credit rows it will be the difference.
  2. Update the Institution column

    1. You can use the following excel code to convert, knowing that this can change we might want this to be a table in the invoicing repo. Keep in mind the code below is order-specific because of Harvard affiliates.
      =IF(REGEXMATCH(D2,"northeastern.edu"),"Northeastern University",IF(REGEXMATCH(D2,"bu.edu"),"Boston University",IF(REGEXMATCH(D2,"bentley.edu"),"Bentley",IF(REGEXMATCH(D2,"uri.edu"),"University of Rhode Island",IF(REGEXMATCH(D2,"redhat.com"),"Red Hat",IF(REGEXMATCH(D2,"childrens.harvard.edu"),"Boston Childrens Hospital",IF(REGEXMATCH(D2,"mclean.harvard.edu"),"McLean Hospital",IF(REGEXMATCH(D2,"meei.harvard.edu"),"Massachusetts Eye & Ear",IF(REGEXMATCH(D2,"dfci.harvard.edu"),"Dana-Farber Cancer Institute",IF(REGEXMATCH(D2,"bwh.harvard.edu"),"Brigham and Women's Hospital",IF(REGEXMATCH(D2,"bidmc.harvard.edu"),"Beth Israel Deaconess Medical Center",IF(REGEXMATCH(D2,"harvard.edu"),"Harvard University",IF(REGEXMATCH(D2,"wpi.edu"),"Worcester Polytechnic Institute",IF(REGEXMATCH(D2,"mit.edu"),"Massachusetts Institute of Technology",IF(REGEXMATCH(D2,"umass.edu"),"University of Massachusetts Amherst",IF(REGEXMATCH(D2,"uml.edu"),"University of Massachusetts Lowell",IF(REGEXMATCH(D2,"codeforboston.org"),"Code For Boston",IF(REGEXMATCH(D2,"mmsh"),"Harvard University",IF(REGEXMATCH(D2,"gstuart"),"University of Massachusetts Amherst",IF(REGEXMATCH(D2,"rudolph"),"Boston Childrens Hospital",IF(REGEXMATCH(D2,"robbaron"),"Boston University",IF(REGEXMATCH(D2,"kmdalton"),"Harvard University",IF(REGEXMATCH(D2,"mzink"),"University of Massachusetts Amherst",IF(REGEXMATCH(D2,"@yale.edu"),"Yale University",IF(REGEXMATCH(D2,"francesco.pontiggia"),"Harvard University","Please add new Institution to function")))))))))))))))))))))))))
  3. Create a CSV of only the HU rows named NERC-YYYY-MM-Harvard-Invoice.csv where YYYY is the year and MM is the month

  4. create a CSV of HU and BU only named NERC Invoice - HU&BU.csv where is the name of the month of the data processed.

  5. Save 2 CSVs above to the CSV folder or to a shared drive that @joachimweyl has access to

Reference

  1. February data

Completion dates

Desired - 2024-01-25 Required - TBD

QuanMPhm commented 6 months ago

@joachimweyl @knikolla Can I take on this issue?

joachimweyl commented 6 months ago

@QuanMPhm how comfortable are you with how the current invoicing scripts are working?

QuanMPhm commented 6 months ago

@joachimweyl I don't have any problems with it so far, such questions for context and how it all works.

joachimweyl commented 6 months ago

Sounds good, take this issue as yours.

QuanMPhm commented 6 months ago

@joachimweyl To address each of the things you have listed so far in the issue description:

  1. I'll use the data on new and pre-existing PIs you've shared with me to add these columns, which should be a straightforward procedure. I assume at some point we would want to keep a single file or some database keeping track of all NERC PIs in the past. I'll leave the design of that file or database to @knikolla
  2. We either have the option of adding hard coded rules in the invoice processing script to determine the institution of each PI, or to obtain each PI's institution through some other, preferably pre-existing, source of data. I'll ask @knikolla if such a data source exists
  3. Fairly simple
  4. Fairly simple
  5. I'll look only possible Google Drive integrations in Python.
joachimweyl commented 6 months ago

@QuanMPhm please note the update to the Institution filtering, of course, if you can find a way to gather Institution we don't have to do any of the filtering email addresses.

knikolla commented 6 months ago

@joachimweyl I created https://github.com/nerc-project/coldfront-plugin-api/issues/39 to have ColdFront return the email in all cases, so as to remove the username exceptions from your logic. But until we have a case where deducing the institution from the email address doesn't work, using the email address is the easiest path forward.

@QuanMPhm Additionally, I want to add that don't be constrained by Google Drive implementations and uploading the results to S3 storage is fine.