BIDS-collaborative / purchasing

Working with Andrew Clark on optimizing purchasing with data
ISC License
2 stars 1 forks source link

Time to completion analysis #8

Closed choldgraf closed 8 years ago

choldgraf commented 9 years ago

This project aims to analyze the time-to-completion of POs, and determine if this is significantly different across people, departments, times of year, etc.

Project lead is @kaiweitan with assistance from @juanshishido

Edit:

Time to completion

Spend vs. quantity x unit price - spend is the amount paid so far, quantity * unit price is the total committed amount Need some way to scale by size of department, # of personnel, etc. Use this to find "Blanket POs" - get blanket PO details from Andrew Come up with a list of blanket POs and their % complete Keep brainstorming efficiency metrics, and then cluster departments together based off of this Deliverable - Use the "time-to-completion" as a means of finding Blanket POs. Include a notebook that explains the analysis run so far, as well as the reasoning behind the particular analysis.

testchange commented 9 years ago

1) How to code for time to complete per unit cost 2) Mastering github commit push pull (I have uploaded into my test repository in the meanwhile)

I will follow up with @juanshishido for the above 2 cases.

juanshishido commented 9 years ago

This involves dealing with datetimes.

Suggested to convert each column to number of days since epoch (type Int). Then, subtraction should result in desired value.

Also, @kaiweitan, remember to add 1 to the subtraction if the dates are inclusive. For example, if you want to count both dates, then March 20, 2015 - March 19, 2015 should equal 2, not 1.

choldgraf commented 9 years ago

And don't hesitate to ping github or slack if you run into any hangups!

On Fri, Mar 20, 2015 at 12:10 AM, Juan Shishido notifications@github.com wrote:

This involves dealing with datetimes.

Suggested to convert each column to number of days since epoch (type Int). Then, subtraction should result in desired value.

Also, @kaiweitan https://github.com/kaiweitan, remember to add 1 to the subtraction if the dates are inclusive. For example, if you want to count both dates, then March 20, 2015 - March 19, 2015 should equal 2, not 1.

— Reply to this email directly or view it on GitHub https://github.com/berkeley-dsc/purchasing/issues/8#issuecomment-83941444 .

choldgraf commented 9 years ago

@kaiweitan I see that you and @juanshishido made some commits last Friday. How are you doing on a time-to-completion analysis? Are questions regarding github cleared up?

testchange commented 9 years ago

yep i have no issues with github! I am now trying to find ways to convert the day to integer so that i can find the cost per day. Do you have any suggestions?

choldgraf commented 9 years ago

Since you're converting things to datetime objects, you should have access to the "days" attribute of any given one. Moreover, you can subtract two datetime objects in order to get the difference in days between them, that should give you what you want. You could do something like this:

data['days_to_close'] = data['po_closed_date'] - data['creation_date']
data['days_to_close'] = data['days_to_close'].astype('timedelta64[D]')

When you subtract two datetimes, you get an object called a "timedelta". By using .astype(timedelta64[D]'), you're telling it to represent the difference between times (aka delta) by days. This should give you the integer amount you want.

agcinsf commented 9 years ago

I cleaned up our Blanket PO analysis workbook in the ntbk/andrew folder. Take a look at it as I believe that "Time to PO completion" is related to whether the PO is a "Blanket PO" or not. In that analysis, I explain what a blanket PO is and how one might find most of them in our Invoice data. Note that it uses a different data set than the data for this analysis.

choldgraf commented 9 years ago

That's interesting! Could you clarify what you mean by "blanket PO"? It seems like it's related to POs that have "lots of vouchers per PO". Does this mean a single PO that covers requests from multiple people and/or groups?

This may also be related to some analyses I was doing on the seasonality of purchases. It seems that some organizations are relatively quiet in their purchases throughout the year, and then there is one "mega purchase" that happens rarely. Perhaps another feature that would help classify a "blanket po" would be the ratio of PO cost to the average PO cost for that purchaser?

dariusmehri commented 9 years ago

hey guys, i am back online to work on the project, sorry again for the absence, my objective is by the end of the week to get the group some nice network graphs and some hard data, nick is away all week (he is out of the country), but i will be in touch with him on and off

here are some of the plans:

  1. i am going to try to trim out the nodes that have only one tie, this should bring down the clutter by a lot, if not then i will figure out a way to reduce it more
  2. did you guys get the department codes yet? the reason i am asking is that in addition to the time dependent analysis, i think it will be useful to compare the structure of transactions by field/ and or department, i.e. compare the hard sciences, to engineering to the social sciences, and so on, i expect we can uncover some structural differences that will be very interesting.
  3. get some more hard numbers (centrality, coherence, etc) and graph over time

darius

On Tue, Mar 24, 2015 at 10:39 AM, Chris Holdgraf notifications@github.com wrote:

That's interesting! Could you clarify what you mean by "blanket PO"? It seems like it's related to POs that have "lots of vouchers per PO". Does this mean a single PO that covers requests from multiple people and/or groups?

This may also be related to some analyses I was doing on the seasonality of purchases. It seems that some organizations are relatively quiet in their purchases throughout the year, and then there is one "mega purchase" that happens rarely. Perhaps another feature that would help classify a "blanket po" would be the ratio of PO cost to the average PO cost for that purchaser?

— Reply to this email directly or view it on GitHub https://github.com/berkeley-dsc/purchasing/issues/8#issuecomment-85613181 .

Darius Mehri Ph.D. Candidate, Sociology University of California, Berkeley

agcinsf commented 9 years ago

Blanket PO's are simply a lump of money the UC has authorized for use with a specific supplier. The money can be used only with the designated supplier, but the PO owner has the discretion over what they buy. Typically, these lumps of money are used over a designated period of time, which can be 1 full year! So they will always have more than 1 voucher/invoice associated with them.

In most cases, there is only 1 PO owner however we do see situations where departments will pool their money into a blanket PO. Stockrooms also like to use blanket POs, though they really dont have a reason to use them, just preference. You might be able to see one such example in the data you have. Look for the largest PO to VWR. It should be around $1M and is for the goods purchased by the Life Science Stockroom in Stanley Hall.

choldgraf commented 9 years ago

Ah I see - so this PO might persist over a long period of time and be attached to multiple purchases, which leads to a larger "time to completion". It seems like this basically removes all information about when each individual purchase is made within that PO, correct?

agcinsf commented 9 years ago

Yes. The PO could exist for up-to 1 year with multiple invoices citing the PO. In a way, yes, the blanket PO masks the buy. However, most buys are not done via blanket PO. From my limited analysis, it seems like there are a subset of departments using blankets and a subset of suppliers.

testchange commented 9 years ago

Thanks for the help @choldgraf! I understand that there are other considerations for example the blanket PO as discussed. However, for now and simplicity, I will look into the cost per day of procurement. I have just committed my latest coding. Currently, I am having difficulty coming out with a box diagram that shows the individual department highest cost per day, lowest cost per day, their median and average. @juanshishido Any thoughts on what is the reason behind the coding issue?

I am now out on vacation so my reply may take longer than usual.

choldgraf commented 9 years ago

Hey @kaiweitan, it looks like you're moving in the right direction. It sounds like you need to tweak some things but the code you've got is generally what you'd need for a boxplot.

A couple thoughts:

  1. Please put your notebook file in the "ntbk" folder, and rename it to something more meaningful (e.g., "time_to_completion")
  2. If you want to split a string you can use its "split" method. e.g.:

    mystr = 'hey_there dude'
    print mystr.split('_')
     ['hey', 'there dude']
    print mystr.split(' ')
     ['hey_there' 'dude']
  3. I think that we should define the total cost of a PO as price_per_item * quantity. It seems like the spend field is not super reliable, though if someone else has figured it out please correct me.