evamaxfield / phd

Docs and resources from planning my PhD application onward
5 stars 2 forks source link

2023-12-12 -- Advising Meeting Agenda #2

Closed evamaxfield closed 8 months ago

evamaxfield commented 8 months ago

Agenda

  1. General Exam
  2. Quarter Wrap Up and Planning
    • [x] EAGER-Survey
      • Distributed + have a plan for sending out emails to people who never responded come January
      • Have a decent amount of data already. Need to parse and dissect it all better but so far so good
      • In general, I think it went okay. My biggest frustration was how long it took to get the application and survey out. Development took longer than planned and I think its partially my fault for not just going with react at the start but I remember talking to you about it and I thought we had said flask might be better. Regardless, it was too much time and it was just frustrating but I still think ultimately it turned out okay.
      • Planning for Next Quarter
      • [x] Have some clear questions to ask?
        • [x] Maintenance and Development: what is the timeline for development as compared to the grant funding window
        • [x] Model Evaluation and Difference from User Reports: we expect bias in the model, but we also know that our definition / original annotation criteria is different from user reported software results. This task is a mix of retraining the model and annotating a sample of the response data to estimate how different our annotation criteria is vs user responses
        • [x] Adoption and Use: there were many respondents who reported field wide adoption and use of their software, this to me indicates a biased sample of respondents however this I think requires at least a bit of investigation for the real adoption / usage:
          • are these repositories containerized / packaged
          • are they starred / forked
          • are they installable / how many downloads
          • are they updated
          • are they contributed to
        • FIELD DIFFERENCE: rate of production, type of product, size of development team, collaboration patterns, etc. with grants we can assign dollar amounts so we can look at an economic angle
      • [x] Interviews? Coming up with a method for interviewee selection / things we can't get from quant analysis alone
      • [x] Paper Conf / Journal target? -- code dispoability angle (CSCW), otherwise we go QSS or scientometrics, JCDL maybe?
    • [x] RS-Graph
      • I really want to focus on this and EAGER next quarter. Both work well together and build off of each other. RS-Graph is also I think just incredibly valuable if we do it well and quickly
      • Really great progress on it imo. For how little time I had gave to it, I think it has a clear plan with clear questions (may need some more operationalization but in general good)
      • Jevin liked the strictest metrics:
        • d_d (dev direct index): "the N number of packages that a developer has led contribution towards which have at least N number of DIRECT dependents"
        • d_i (dev indirect index): "the N number of packages that a developer has led contribution towards which have at least N number of INDIRECT dependents"
        • d_m (dev mixed index): "the N number of packages that a developer has led contribution towards which have at least N number of DIRECT or INDIRECT dependents"
        • problem: threshold of "contribution"
        • problem: package versioning
        • facet: dev must be a matched author vs dev doesn't have to be a matched author (i.e. someone in the acknowledgement or not in papers at all)
      • Planning for Next Quarter
        • [x] Annotation and IRR of person matching (GitHub user to Author) entity matching + fine-tuning of model
          • Combine into a paper with active learning training of Paper + Repo entity matching model building off of Truede et al work?
        • [x] Questions:
          • Authorship vs Dev Contribution index
          • Contributions to repositories (timeline -- pre, +- n months around pub period, post)
          • Repo contributor position within research paper
        • [x] Once pipeline for analysis is complete, scaling up of data sources (I believe we have 8 or 9 which haven't been added that would make the dataset very very large)
        • [x] Plan for publishing? QSS seems like a good target
        • I would sort of love to do a nice release and include a website for lookup / exploration of the graph but I think that comes while paper is in review, etc.
        • Contributions of work:
          • Largest dataset of linked paper + repo
          • Graph processing to get all bi-directional-dependencies of repos
          • new questions answered
        • [x] Lots of follow up / continuation:
          • Modeling algorithm, analysis script, tool, and infra from graph properties
          • long tail of bad datasets
    • [x] JOSE
      • Still in progress. I continue to feel bad at how long this is taking but I do feel like I made a lot of progress on it especially considering these later chapters are where a bulk of the work lies (lots of computation and annotation etc.)
      • Happy to switch towards making the last chapter basically the work for local interest groups
        • there will need to be additional analyse and evaluation of some of the methods for the local interest groups paper but for the chapter i think many are fine
      • some things still left to do
        • LLM prompting chapter
        • exercises (and reference) for each chapter
        • move from bullet point text to full text
    • [x] Local Interest Groups
      • JOSE took up much of my focus (outside of the survey work)
        • i am blown away that I feel like I am able to get a semi-working (naive and okay) version of a public comment period extraction method out in a few days of work
      • I have a rough draft of the paper that honestly I feel like is really close to being done. There is definitely a pass of edits and suggestions needed but if everything goes to plan like it currently is, I will need some annotation help to construct the appropriate datasets
      • Honestly, writing this draft has me excited. I really hope these models work and we can both finish this paper (esp now that there is a clear plan) as i think it will have an impact both in political science and will be directly useful for our friends :)
    • [x] ARDC
      • yay! its finished and submitted!
      • thank you for all the work you did on the weekend, I should have written some of that material up myself
  3. General Check In
    • Winter school lecture material, would love a review of my materials when you have a chance (it will be next week or two weeks)
    • How are you? Just general anything else we want to chat about.
    • Generally how do you think I did this quarter?
evamaxfield commented 8 months ago

good thesis to look at: https://github.com/kequach/Thesis-Mapping-RS/tree/main

set of authors -> set of github identities

nniiicc commented 8 months ago

Do a study that looks at Bits and Bytes config - peft

evamaxfield commented 8 months ago

For RS-Graph question on position of devs in paper, use the authorship roles taxonomy: https://credit.niso.org/

evamaxfield commented 8 months ago

On friday, we need to submit out billing filing with CS&S for ARDC