BIDS-collaborative / purchasing

Working with Andrew Clark on optimizing purchasing with data
ISC License
2 stars 1 forks source link

Meeting - 3/13/2015 #10

Closed choldgraf closed 9 years ago

choldgraf commented 9 years ago

This will be a short update meeting to discuss progress on the following projects:

7 Network analysis

8 Time to completio analysis

9 Text classification analysis

Please post your progress here, as well as challenges that you've faced and things that need to be done next. As a group we can discuss these tomorrow.

@juanshishido @nlin3330 @dariusmehri @kaiweitan @anthonysuen

Chris

dariusmehri commented 9 years ago

I will be out of town starting this evening until next Wednesday, our objective is to have one graph done by next week for the presentation. We are still figuring out how to input and graph data in NetworkX.

nlin3330 commented 9 years ago

I just finished making a preliminary graph for the network analysis but it is extremely clutter which i guess is to be expected (over 20000 supplier nodes). So the next step is to find a way to create a legible graph. figure_1

choldgraf commented 9 years ago

That's pretty cool - can you commit some code that you used to create this? Perhaps you can give an explanation of how it works.

choldgraf commented 9 years ago

Once we hear from @juanshishido and @kaiweitan, I'll put together a plan for the meeting

testchange commented 9 years ago

Have installed anaconda in my macbook. The challenge for me right now is to learn the appropriate tools/code for me to come out with data visualization. I will be down for the meeting today.

testchange commented 9 years ago

So here's the few questions i am working on:

I have been searching through the net for the appropriate code for the following questions.

How to create a new column for Time of procurement = Creating the length of date(Po_closed_date - Creation date)?

Running a graph of Time of procurement against Buyer_Last_Name?

Running a graph of Time of procurement against Department name?

dariusmehri commented 9 years ago

it is cool but it doesn't show much, it should be good enough for the meeting on the 15th though

i will send out more guidelines on what i think needs to be done to improve the graph(s) later this evening

darius

On Fri, Mar 13, 2015 at 9:55 AM, Chris Holdgraf notifications@github.com wrote:

That's pretty cool - can you commit some code that you used to create this? Perhaps you can give an explanation of how it works.

— Reply to this email directly or view it on GitHub https://github.com/berkeley-dsc/purchasing/issues/10#issuecomment-79129311 .

Darius Mehri Ph.D. Candidate, Sociology University of California, Berkeley

choldgraf commented 9 years ago

That would be perfect, thanks Darius

On Fri, Mar 13, 2015 at 11:27 AM, dariusmehri notifications@github.com wrote:

it is cool but it doesn't show much, it should be good enough for the meeting on the 15th though

i will send out more guidelines on what i think needs to be done to improve the graph(s) later this evening

darius

On Fri, Mar 13, 2015 at 9:55 AM, Chris Holdgraf notifications@github.com wrote:

That's pretty cool - can you commit some code that you used to create this? Perhaps you can give an explanation of how it works.

— Reply to this email directly or view it on GitHub < https://github.com/berkeley-dsc/purchasing/issues/10#issuecomment-79129311

.

Darius Mehri Ph.D. Candidate, Sociology University of California, Berkeley

— Reply to this email directly or view it on GitHub https://github.com/berkeley-dsc/purchasing/issues/10#issuecomment-79221574 .

dariusmehri commented 9 years ago

here is something really quick:

graph only three months of one year, i.e. jan, feb, march, you will need to go back to the original data (before you used the drop_duplicates() function) and subset the data with those months and then drop duplicates, if it is still too cluttered, than just do one month

if you can, create dept nodes as shaded circles and supplier nodes as circles without shade but with shaded edges

display the graph where the nodes are displayed according to centrality measure, those with high centrality large and low centrality small (there should be a way in networkx to do this automatically, this is standard in network packages), in this way, we can visualize who the central actors are (although we can't put names to them yet, it will be nice to know if there are a handful of actors who are centrally located)

i am out of the state for a week and am very busy w/ interviews, i can perhaps do some of this but i probably can't do that much until spring break

darius

On Fri, Mar 13, 2015 at 11:29 AM, Chris Holdgraf notifications@github.com wrote:

That would be perfect, thanks Darius

On Fri, Mar 13, 2015 at 11:27 AM, dariusmehri notifications@github.com wrote:

it is cool but it doesn't show much, it should be good enough for the meeting on the 15th though

i will send out more guidelines on what i think needs to be done to improve the graph(s) later this evening

darius

On Fri, Mar 13, 2015 at 9:55 AM, Chris Holdgraf < notifications@github.com> wrote:

That's pretty cool - can you commit some code that you used to create this? Perhaps you can give an explanation of how it works.

— Reply to this email directly or view it on GitHub <

https://github.com/berkeley-dsc/purchasing/issues/10#issuecomment-79129311

.

Darius Mehri Ph.D. Candidate, Sociology University of California, Berkeley

— Reply to this email directly or view it on GitHub < https://github.com/berkeley-dsc/purchasing/issues/10#issuecomment-79221574

.

— Reply to this email directly or view it on GitHub https://github.com/berkeley-dsc/purchasing/issues/10#issuecomment-79223430 .

Darius Mehri Ph.D. Candidate, Sociology University of California, Berkeley

dariusmehri commented 9 years ago

btw, the centrality measure will probably be for unimodal graphs where we have a two mode graph, don't worry about this, we can figure out later how to do it correctly, it will still give nice visuals

darisu

On Fri, Mar 13, 2015 at 11:49 AM, Darius Mehri darius_mehri@berkeley.edu wrote:

here is something really quick:

graph only three months of one year, i.e. jan, feb, march, you will need to go back to the original data (before you used the drop_duplicates() function) and subset the data with those months and then drop duplicates, if it is still too cluttered, than just do one month

if you can, create dept nodes as shaded circles and supplier nodes as circles without shade but with shaded edges

display the graph where the nodes are displayed according to centrality measure, those with high centrality large and low centrality small (there should be a way in networkx to do this automatically, this is standard in network packages), in this way, we can visualize who the central actors are (although we can't put names to them yet, it will be nice to know if there are a handful of actors who are centrally located)

i am out of the state for a week and am very busy w/ interviews, i can perhaps do some of this but i probably can't do that much until spring break

darius

On Fri, Mar 13, 2015 at 11:29 AM, Chris Holdgraf <notifications@github.com

wrote:

That would be perfect, thanks Darius

On Fri, Mar 13, 2015 at 11:27 AM, dariusmehri notifications@github.com wrote:

it is cool but it doesn't show much, it should be good enough for the meeting on the 15th though

i will send out more guidelines on what i think needs to be done to improve the graph(s) later this evening

darius

On Fri, Mar 13, 2015 at 9:55 AM, Chris Holdgraf < notifications@github.com> wrote:

That's pretty cool - can you commit some code that you used to create this? Perhaps you can give an explanation of how it works.

— Reply to this email directly or view it on GitHub <

https://github.com/berkeley-dsc/purchasing/issues/10#issuecomment-79129311

.

Darius Mehri Ph.D. Candidate, Sociology University of California, Berkeley

— Reply to this email directly or view it on GitHub < https://github.com/berkeley-dsc/purchasing/issues/10#issuecomment-79221574

.

— Reply to this email directly or view it on GitHub https://github.com/berkeley-dsc/purchasing/issues/10#issuecomment-79223430 .

Darius Mehri Ph.D. Candidate, Sociology University of California, Berkeley

Darius Mehri Ph.D. Candidate, Sociology University of California, Berkeley

dariusmehri commented 9 years ago

one more point, i think two-mode is the same is bipartite, two-mode is more common usage in sociology, here is the article i am referencing:

http://www.analytictech.com/borgatti/papers/2modeconcepts.pdf

On Fri, Mar 13, 2015 at 11:51 AM, Darius Mehri darius_mehri@berkeley.edu wrote:

btw, the centrality measure will probably be for unimodal graphs where we have a two mode graph, don't worry about this, we can figure out later how to do it correctly, it will still give nice visuals

darisu

On Fri, Mar 13, 2015 at 11:49 AM, Darius Mehri darius_mehri@berkeley.edu wrote:

here is something really quick:

graph only three months of one year, i.e. jan, feb, march, you will need to go back to the original data (before you used the drop_duplicates() function) and subset the data with those months and then drop duplicates, if it is still too cluttered, than just do one month

if you can, create dept nodes as shaded circles and supplier nodes as circles without shade but with shaded edges

display the graph where the nodes are displayed according to centrality measure, those with high centrality large and low centrality small (there should be a way in networkx to do this automatically, this is standard in network packages), in this way, we can visualize who the central actors are (although we can't put names to them yet, it will be nice to know if there are a handful of actors who are centrally located)

i am out of the state for a week and am very busy w/ interviews, i can perhaps do some of this but i probably can't do that much until spring break

darius

On Fri, Mar 13, 2015 at 11:29 AM, Chris Holdgraf < notifications@github.com> wrote:

That would be perfect, thanks Darius

On Fri, Mar 13, 2015 at 11:27 AM, dariusmehri notifications@github.com wrote:

it is cool but it doesn't show much, it should be good enough for the meeting on the 15th though

i will send out more guidelines on what i think needs to be done to improve the graph(s) later this evening

darius

On Fri, Mar 13, 2015 at 9:55 AM, Chris Holdgraf < notifications@github.com> wrote:

That's pretty cool - can you commit some code that you used to create this? Perhaps you can give an explanation of how it works.

— Reply to this email directly or view it on GitHub <

https://github.com/berkeley-dsc/purchasing/issues/10#issuecomment-79129311

.

Darius Mehri Ph.D. Candidate, Sociology University of California, Berkeley

— Reply to this email directly or view it on GitHub < https://github.com/berkeley-dsc/purchasing/issues/10#issuecomment-79221574

.

— Reply to this email directly or view it on GitHub https://github.com/berkeley-dsc/purchasing/issues/10#issuecomment-79223430 .

Darius Mehri Ph.D. Candidate, Sociology University of California, Berkeley

Darius Mehri Ph.D. Candidate, Sociology University of California, Berkeley

Darius Mehri Ph.D. Candidate, Sociology University of California, Berkeley

choldgraf commented 9 years ago

Thanks for the feedback Darius - we'll talk about it at the meeting and can get thoughts back to you.

@juanshishido, are you planning to video-chat into this meeting? CHris

choldgraf commented 9 years ago

OK, here's a brief plan for today. We'll be meeting at BIDS at 1:30pm (in 9 minutes so hopefully you already knew this haha).

  1. Admin stuff (5 min)
  2. Update from Juan/Kai meeting + brainstorm page (5 min)
  3. BIDS update tomorrow (10 min) - I'm giving a short presentation tomorrow, let's talk about some challenges we've faced, things worth mentioning
  4. Project updates / challenges (20 min) - Talk about what we've tried so far, and what challenges we're facing / what we need help with.

Total meeting time should hopefully be about 30-40 minutes. See you guys soon.

anthonysuen commented 9 years ago

Yeah the BIDS presentation tomorrow and BIDS social teas the monday after Spring Break are all great venues to get feedback and ask for help from experts in the field. If there are any real roadblocks it might be worth sending a message out the BIDS Slack.

Cheers,

Anthony

On Fri, Mar 13, 2015 at 1:24 PM, Chris Holdgraf notifications@github.com wrote:

OK, here's a brief plan for today. We'll be meeting at BIDS at 1:30pm (in 9 minutes so hopefully you already knew this haha).

  1. Admin stuff (5 min)
  2. Update from Juan/Kai meeting + brainstorm page (5 min)
  3. BIDS update tomorrow (10 min) - I'm giving a short presentation tomorrow, let's talk about some challenges we've faced, things worth mentioning
  4. Project updates / challenges (20 min) - Talk about what we've tried so far, and what challenges we're facing / what we need help with.

Total meeting time should hopefully be about 30-40 minutes. See you guys soon.

— Reply to this email directly or view it on GitHub https://github.com/berkeley-dsc/purchasing/issues/10#issuecomment-79347203 .

Anthony Suen

dariusmehri commented 9 years ago

please mention that since we have time-series data, we can potentially do some really cool stuff (eventually) with network analysis, ie see how the structure and measures change over time, by month, week and even day (if we get hardcore)

darius

On Fri, Mar 13, 2015 at 1:30 PM, anthonysuen notifications@github.com wrote:

Yeah the BIDS presentation tomorrow and BIDS social teas the monday after Spring Break are all great venues to get feedback and ask for help from experts in the field. If there are any real roadblocks it might be worth sending a message out the BIDS Slack.

Cheers,

Anthony

On Fri, Mar 13, 2015 at 1:24 PM, Chris Holdgraf notifications@github.com wrote:

OK, here's a brief plan for today. We'll be meeting at BIDS at 1:30pm (in 9 minutes so hopefully you already knew this haha).

  1. Admin stuff (5 min)
  2. Update from Juan/Kai meeting + brainstorm page (5 min)
  3. BIDS update tomorrow (10 min) - I'm giving a short presentation tomorrow, let's talk about some challenges we've faced, things worth mentioning
  4. Project updates / challenges (20 min) - Talk about what we've tried so far, and what challenges we're facing / what we need help with.

Total meeting time should hopefully be about 30-40 minutes. See you guys soon.

— Reply to this email directly or view it on GitHub < https://github.com/berkeley-dsc/purchasing/issues/10#issuecomment-79347203

.

Anthony Suen

— Reply to this email directly or view it on GitHub https://github.com/berkeley-dsc/purchasing/issues/10#issuecomment-79353284 .

Darius Mehri Ph.D. Candidate, Sociology University of California, Berkeley

choldgraf commented 9 years ago

Hey guys - @nlin3330, @kaiweitan and I met today to talk about updates on the project and address some questions that people had. Here are some general updates:

  1. I'm going to speak tomorrow at InfoCamp. I'll talk about our general data situation, a few projects we're working on, and some challenges that we've had.
  2. @dariusmehri and @nlin3330 have got some preliminary graph viz stuff created. It sounds like the main thing to do is find ways to prune the graph so that it's more readable. Here are some next steps:
    1. Only plot groups that have > N transactions
    2. Only plot edges with > N co-occurrences, or plot weighted edges in the graphs
    3. Split up plots by week / month / year / season / etc to see if patterns change over time
  3. @kaiweitan has been getting up to speed with pandas and github. We covered some useful bits about both. Next step is to create a "time to completion" column, and then create some plots.
  4. @juanshishido uploaded some code to the github which is a good start. Not sure on next steps though maybe Juan can update us on that
  5. This week keep working on projects. I'll update you guys on how the infocamp presentation goes. Don't forget to create issues for new questions / comments, or comment on current issues with updates and such.

Talk to you guys soon, Chris

juanshishido commented 9 years ago

Apologies for not commenting earlier. I had a big project to finish by 4pm today. For some reason, I didn't have today's meeting on my calendar. Maybe someone can send me that?

The notebook I updated needs a lot more. My plan is to:

choldgraf commented 9 years ago

Thanks all for your thoughts. Depending on how #11 goes, we'll meet on Thursday or Friday of this week. BIDS checkpoint is coming up, so we should put together preliminary analyses and make them look pretty / interesting, even if they're not finalized. Then we can use this in our presentation.