Closed gtsueng closed 1 month ago
Moved to here to keep measurementTechniques separate from topicCategories: https://docs.google.com/spreadsheets/d/1jkhidFmsp0f_yL8S5wpZ-oBA-eLhQQmESQEq4Lrhx3M/edit#gid=1969630319
@ZubairQazi it looks like the thresholds may vary by repo, but I'm not 100% sure since OMICS-DI records outnumber everything else. For the following repositories:
Can you run GPT for measurementTechnique extraction for the following:
If a repo doesn't have at least 5 records that fit the requirements above, just pull however many there are (if any)
Sample sheet for Dataverse, Mendeley, Zenodo (20 records each) https://docs.google.com/spreadsheets/d/1jkhidFmsp0f_yL8S5wpZ-oBA-eLhQQmESQEq4Lrhx3M/edit#gid=258837660
Results of the length check: https://docs.google.com/spreadsheets/d/1crfLDl5_c7jZ47JefhOCf6tx_cM-u8AkK6rsXttM6s8/edit#gid=1639648736
This issue has been marked as pending close out
and will be closed after a week if there are no additional comments
Issue Name
Identify length limitations for ChatGPT inference
Issue Description
ChatGPT has a higher likelihood of hallucinating results for measurementTechnique extraction when the length of 'name'+'description' text is short. Initial observations by Zubair suggests that while this is highly-dependent on the actual text, 15 words seem to be the minimum.
To validate or otherwise improve this estimate:
Issue Discussion
A 50/50 version of this approach was discussed at the internal meeting dated 2024.04.24
Please select the type of metadata improvement
Meta URL
No response
Related WBS task
https://github.com/NIAID-Data-Ecosystem/nde-roadmap/issues/13
For internal use only. Assignee, please select the status of this issue
Status Description
No response
Request status check list