Watts-Lab / surveyGPT

Run surveys on GPTs
MIT License
0 stars 0 forks source link

A vision for surveyGPT #1

Open markwhiting opened 1 year ago

markwhiting commented 1 year ago

We would like a system that can answer arbitrary human completable surveys using a ML model, e.g., GPT. We would like this to be rather straightforward for a user, e.g., provide a URL to the survey site — such as a Qualtrics or Surveyor link — and the existing system gets the responses as though a human had completed it. We might also provide an option to share the results with the caller of the system, e.g., in the case that we don't own the survey but still want to know how it answered. We would also likely want an option to adjust temperature of the model (assuming a GPT model).

There are examples of people using similar approaches, e.g., in market research research and other economics. surveyGPT makes these kinds of activities much easier, and allows general reproduction of a wide range of survey based research quickly. Further, having such a system might let us pursue a systematic response to the concept of machine behavior.

Roadmap

We don't need to solve all the problems at once, but at least the following major issues need to be resolved, and this is an attempt at ordering them appropriately:

Phase 1

Phase 2

Technical design

The function specification might look something like:

const async surveyGPT = (url, temperature = 1): Promise<responseObject> => {}

Considerations

  1. I am open minded about the language but it seems that distributing it as either an NPM package or a python library would be rather convenient for us and others.
  2. The system should be designed to rely as little as possible on external libraries or infrastructure so as to be easier to maintain.
  3. The system should allow people to use their own OpenAI (or other platform) keys, either by externally integrating their APIs (set up your openAI object and pass it to surveyGPT) or internally doing so (providing a way for users to safely pass those variables to us).
  4. We should leverage automated testing to check that survey responses are remaining static and to have an up to date calibration of temperature settings.
markwhiting commented 1 year ago

@JamesPHoughton — I'd be interested to get feedback on this if you have a chance. Any thought or suggestions would be helpful!

JamesPHoughton commented 1 year ago

In all honesty, it sounds inevitable, even if we don't do it, and it sounds like a death knell for online survey research... basically, a supercharged form-filling robot? I think it might be worth exploring the idea, even if only to figure out what you'd need to do as a survey researcher to guard against this sort of thing being used by survey respondents. However, I could imagine that a researcher who built it might be a little non grata for a while?

Sorry to be down on the idea. Normally, I have no qualms about plucking feathers and pureeing cartons of eggs...

markwhiting commented 1 year ago

Yes, I appreciate that perspective and I agree. Perhaps we shouldn't make it too east for others to use too quickly?

But I think there are also various types of upside, e.g., build ways to automatically test and validate surveys, check if existing survey research appears to lead to robust results or not (i.e., run replications on existing survey research and see where the noise is)...

JamesPHoughton commented 1 year ago

My guess is that a lot of the work is in getting the LLM to read and interact with the webpage, and that there are probably lots of people working on that right now. Maybe the thing to do is think through a strategy for using the LLM to support research, and then prototype with data from existing surveys that we've done? Ie, ignore the webpage interaction part of it for now, and just use a text file with questions and response categories, and work out how to make the LLM part of the system provide the science value?

markwhiting commented 1 year ago

Sure, we're already doing that a bit. Something I specifically want to do beyond that is measure what kind of person a model is like, or can embody, by querying it with many standard instruments. But I do agree that skipping the "interaction architecture" as I outlined it above might make achieving some goals around this much faster initially.