CDLUC3 / data-curation

Exploratory project to Our goal is to catalog and evaluate datasets. We will determine ways to evaluate data files against the indicators above and offer solutions for increasing their quality. We aim to translate best practices into workflows that help with everyday use cases.
0 stars 0 forks source link

See about testing on AWS bedrock #36

Open sfisher opened 3 days ago

sfisher commented 3 days ago

See what I need to do to get accounts:

  1. What do I need to do to enable using a service?
  2. How do I manage the security for the requests (API keys or what?)

Maybe talk with Martin.

marisastrong commented 3 days ago

Do you know which FM we want to use?

https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started-api.html To use Bedrock, you must request access to specific Bedrock’s foundation models (FMs). To do so, you will need to have the correct IAM Permissions https://docs.aws.amazon.com/bedrock/latest/userguide/security_iam_id-based-policy-examples.html . Once access is granted, the models will be available for use by all users of this account. If you do not request access to models now, you may still browse Bedrock and return to the Model access page to manage model access later.

I've been looking at questions to help evaluate using new tools and serviceshttps://docs.google.com/document/d/1wuEkNvXUVZU9Wy-a9eca1WcV9lfWE2EhGy9STZVgoas/edit?usp=sharing. It helps us make informed decisions on directions we want to take. Steve is familiar with this idea/discussion. This is not to dissuade you from trialing/demo-ing a new service, but to be aware of questions to consider as we move forward with new tools in our environment early in the process.


From: Scott F @.> Sent: Friday, September 27, 2024 2:23 PM To: CDLUC3/data-curation @.> Cc: Subscribed @.***> Subject: [CDLUC3/data-curation] See about testing on AWS bedrock (Issue #36)

CAUTION: EXTERNAL EMAIL

See what I need to do to get accounts:

  1. What do I need to do to enable using a service?
  2. How do I manage the security for the requests (API keys or what?)

Maybe talk with Martin.

— Reply to this email directly, view it on GitHubhttps://github.com/CDLUC3/data-curation/issues/36, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAGF4L5F76P4LPBMQTNFZ6LZYXED5AVCNFSM6AAAAABO75DV62VHI2DSMVQWIX3LMV43ASLTON2WKOZSGU2TGNZVGI3TCMY. You are receiving this because you are subscribed to this thread.Message ID: @.***>

sfisher commented 23 hours ago

This isn't really a new service, though it runs as a private demo only open only to us. This is an expansion of Steve's evaluation and demo of looking at different LLMs and understanding how to help people improve their data and do curation of items, especially tabular data.

Testing Llama is fairly important for stakeholder concerns for the future since it's an open-weights model and doesn't use submitted data for training when running ourselves or through a service. This isn't so much a problem for testing, afaict, since the data used for evaluation right now is all public from Dryad/Zenodo. But Llama is one of the models that others in the space that he's talked to have generally settled on and it's important to evaluate.

Martin mentioned that they have evaluated models using Bedrock and the cost was minimal (which has been our experience so far with ChatGPT and Gemini, also, like $10 usage from CGPT over months and less than the from Gemini). For minimal costs like that, he expressed that it was a feasible service to minimally test and demo with rather, than finding a way to run it elsewhere.

Most likely we're looking at either the Llama 3.2 11B or 90B parameter models for evaluation. The 1 and 3 Billion parameter models are more sized to run on local devices and aren't quite as capable. These larger Llama models also offer "multimodal" modes which means they can evaluate images in addition to text (not the largest target of our evaluation, but Steve likes to kick the tires and see what they can do in the process of evaluation).