2023-12-12 -- Advising Meeting Agenda

evamaxfield commented 8 months ago

Agenda

General Exam
- The most up-to-date information from iSchool advisors here: https://drive.google.com/file/d/18dNwo42wkAHHB3OmNB5FS0PFqEWedvAh/view
  - I don't see anywhere in that presentation or in https://canvas.uw.edu/courses/1694344/pages/committees-and-exams#timelines which states I need to submit these documents to the dept
- [x] Schedule a meeting with Bill + Jevin to discuss prospectus + reading list and scheduling of oral exam?
  - [x] Oral exam week of Feb 12 (or friday feb 9 9am - 11am)?
- [x] Add Lucy to committee after exam? -- take what i wrote for the exam to her and explain, and make a clear reason why i would like her help and where. -- other possible is external from dept / university.
Quarter Wrap Up and Planning
- [x] EAGER-Survey
  - Distributed + have a plan for sending out emails to people who never responded come January
  - Have a decent amount of data already. Need to parse and dissect it all better but so far so good
  - In general, I think it went okay. My biggest frustration was how long it took to get the application and survey out. Development took longer than planned and I think its partially my fault for not just going with react at the start but I remember talking to you about it and I thought we had said flask might be better. Regardless, it was too much time and it was just frustrating but I still think ultimately it turned out okay.
  - Planning for Next Quarter
  - [x] Have some clear questions to ask?
    - [x] Maintenance and Development: what is the timeline for development as compared to the grant funding window
    - [x] Model Evaluation and Difference from User Reports: we expect bias in the model, but we also know that our definition / original annotation criteria is different from user reported software results. This task is a mix of retraining the model and annotating a sample of the response data to estimate how different our annotation criteria is vs user responses
    - [x] Adoption and Use: there were many respondents who reported field wide adoption and use of their software, this to me indicates a biased sample of respondents however this I think requires at least a bit of investigation for the real adoption / usage:
      - are these repositories containerized / packaged
      - are they starred / forked
      - are they installable / how many downloads
      - are they updated
      - are they contributed to
    - FIELD DIFFERENCE: rate of production, type of product, size of development team, collaboration patterns, etc. with grants we can assign dollar amounts so we can look at an economic angle
  - [x] Interviews? Coming up with a method for interviewee selection / things we can't get from quant analysis alone
  - [x] Paper Conf / Journal target? -- code dispoability angle (CSCW), otherwise we go QSS or scientometrics, JCDL maybe?
- [x] RS-Graph
  - I really want to focus on this and EAGER next quarter. Both work well together and build off of each other. RS-Graph is also I think just incredibly valuable if we do it well and quickly
  - Really great progress on it imo. For how little time I had gave to it, I think it has a clear plan with clear questions (may need some more operationalization but in general good)
  - Jevin liked the strictest metrics:
    - d_d (dev direct index): "the N number of packages that a developer has led contribution towards which have at least N number of DIRECT dependents"
    - d_i (dev indirect index): "the N number of packages that a developer has led contribution towards which have at least N number of INDIRECT dependents"
    - d_m (dev mixed index): "the N number of packages that a developer has led contribution towards which have at least N number of DIRECT or INDIRECT dependents"
    - problem: threshold of "contribution"
    - problem: package versioning
    - facet: dev must be a matched author vs dev doesn't have to be a matched author (i.e. someone in the acknowledgement or not in papers at all)
  - Planning for Next Quarter
    - [x] Annotation and IRR of person matching (GitHub user to Author) entity matching + fine-tuning of model
      - Combine into a paper with active learning training of Paper + Repo entity matching model building off of Truede et al work?
    - [x] Questions:
      - Authorship vs Dev Contribution index
      - Contributions to repositories (timeline -- pre, +- n months around pub period, post)
      - Repo contributor position within research paper
    - [x] Once pipeline for analysis is complete, scaling up of data sources (I believe we have 8 or 9 which haven't been added that would make the dataset very very large)
    - [x] Plan for publishing? QSS seems like a good target
    - I would sort of love to do a nice release and include a website for lookup / exploration of the graph but I think that comes while paper is in review, etc.
    - Contributions of work:
      - Largest dataset of linked paper + repo
      - Graph processing to get all bi-directional-dependencies of repos
      - new questions answered
    - [x] Lots of follow up / continuation:
      - Modeling algorithm, analysis script, tool, and infra from graph properties
      - long tail of bad datasets
- [x] JOSE
  - Still in progress. I continue to feel bad at how long this is taking but I do feel like I made a lot of progress on it especially considering these later chapters are where a bulk of the work lies (lots of computation and annotation etc.)
  - Happy to switch towards making the last chapter basically the work for local interest groups
    - there will need to be additional analyse and evaluation of some of the methods for the local interest groups paper but for the chapter i think many are fine
  - some things still left to do
    - LLM prompting chapter
    - exercises (and reference) for each chapter
    - move from bullet point text to full text
- [x] Local Interest Groups
  - JOSE took up much of my focus (outside of the survey work)
    - i am blown away that I feel like I am able to get a semi-working (naive and okay) version of a public comment period extraction method out in a few days of work
  - I have a rough draft of the paper that honestly I feel like is really close to being done. There is definitely a pass of edits and suggestions needed but if everything goes to plan like it currently is, I will need some annotation help to construct the appropriate datasets
  - Honestly, writing this draft has me excited. I really hope these models work and we can both finish this paper (esp now that there is a clear plan) as i think it will have an impact both in political science and will be directly useful for our friends :)
- [x] ARDC
  - yay! its finished and submitted!
  - thank you for all the work you did on the weekend, I should have written some of that material up myself
General Check In
- Winter school lecture material, would love a review of my materials when you have a chance (it will be next week or two weeks)
- How are you? Just general anything else we want to chat about.
- Generally how do you think I did this quarter?

evamaxfield commented 8 months ago

good thesis to look at: https://github.com/kequach/Thesis-Mapping-RS/tree/main

set of authors -> set of github identities

nniiicc commented 8 months ago

Do a study that looks at Bits and Bytes config - peft

Hugging face released for fine-tunning , and that included how to quantize model code...
Repositories over the last year that had a code state before and after the release of peft

evamaxfield commented 8 months ago

For RS-Graph question on position of devs in paper, use the authorship roles taxonomy: https://credit.niso.org/

evamaxfield commented 8 months ago

On friday, we need to submit out billing filing with CS&S for ARDC

evamaxfield / phd

2023-12-12 -- Advising Meeting Agenda #2

Agenda