Closed xehu closed 3 months ago
Neat idea! Keen to see more results as you play with it.
We tried chat GPT and a few other text AI tools to try to automate some of the fields where we are summarizing elements of a study, and we've had little success. It's been (a) inaccurate (!), (b) oddly inconsistent in formatting, and (c) includes irrelevant stuff that we don't want (e.g., details about a treatment when we really just want text focused on the outcome variable). I think it would be neat to incorporate some AI fields -- mostly to SAVE TIME in the coding process if not boosting reliability. But these issues have frustrated our team to date. If you find a better solution that actually works in terms of accuracy, reliability, and precision please LMK! The RA on my team who has worked most on it is Anushka Bhansali, in case you want to chat with her at all.
Sounds very interesting. There is a bunch of literature (called prompt engineering) that deals with how to get the desired output. I could see using those best practices and methods helping mitigate the issues of inaccuracy, inconsistent formatting, and the addition of irrelevant stuff.
I think generally, using "few-shot prompting", where you give example input and output (basically like training data) really helps with formatting and accuracy. Also, using very clear formats that contain no letter symbols can help (Context: {} Answer: {}. Lastly, there is this interesting idea called chain of thought, where you ask the model to give intermediate reasoning steps, I think this could potentially also be helpful. I'm by no means an expert at this, but I have read a bit into the literature and would be happy to chat about it if you wanna play around with it and see if we can get it to work better.
We do (have API access to GPT-3)! I can add you.
We have been using it for Common sense and I played a little with it for mapping but found it a bit too unreliable. Like running the same request gets different responses. This is happening less with GPT-4, but doesn't appear entirely robust.
Yes please!! @markwhiting, can you please add me? I'd love to play with it!
On Thu, Mar 16, 2023 at 3:33 PM Mark Whiting @.***> wrote:
We do (have API access to GPT-3)! I can add you.
We have been using it for Common sense and I played a little with it for mapping but found it a bit too unreliable. Like running the same request gets different responses. This is happening less with GPT-4, but doesn't appear entirely robust.
— Reply to this email directly, view it on GitHub https://github.com/Watts-Lab/team-process-map/issues/89#issuecomment-1472630402, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3VWKLBC572M3QSDJSYTG3W4NTKDANCNFSM6AAAAAAV5RBKGI . You are receiving this because you authored the thread.Message ID: @.***>
Closing as this is about a previous version of the project and no longer related to the toolkit
This week, @amaatouq reached out with an interesting idea, which is that we can potentially train a pipeline for using GPT to rate tasks (and even test to see if GPT can replicate our raters' mapping of task features). Mohammed Alsobay has done some amazingly rapid iteration, showing some promising initial results on using GPT on the Task Mapping questions.
Here are some of the results Mohammed found, using just 2 questions (Q1 - physical/mental and Q22 - conflicting tradeoffs).
We're now interested in playing around with pre-training and fine-tuning the model, to see if it can do better.
Given all this, I'm wondering whether...
Thoughts?