clp-research / clembench

A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark
MIT License
22 stars 31 forks source link

revamp mechanism for mapping model names to backends (and further information) #26

Closed davidschlangen closed 7 months ago

davidschlangen commented 10 months ago

The problem: At the moment, a lot of information related to models is hard coded in the backends.

Think about whether a more general mechanism could help. Like having a single .json file that links between model names (to be used when calling via cli.py) and other information, like:

What kinds of changes to the current system would that require?

(This could even be useful for [semi-] automatically compiling the list of tested models on the website.)

(Tagging @phisad , @sherzod-hakimov , & @Gnurro for the discussion.)

Gnurro commented 10 months ago

Currently, a backend can just be added as a file/module and works (as long as it conforms to the standards set), in a 'plug-and-play' way. Having to add the information of its supported models to a single JSON seems like more potentially incompatible hardcode to me.

The backends - at least the HF transformers-based ones I work on - have the information they need to work hardcoded, and I don't see how taking it out and exposing it for calls would improve anything. To me, changing this would make things more complicated and spread out over more code. The purpose of having different backends in code is handling specific models in proper ways, a single JSON or similar would make sense if its contents could be fed into multiple backends and work - but this is not the case, and that's why we have different backends in the first place.

If the main purpose of this revamp is to make backend/model information more accessible, I'd rather have enhancements to the existing backend foundation (in https://github.com/clp-research/clembench/blob/main/backends/__init__.py ) than adding extra steps and complication to the process of adding backends. If the central JSON is needed, generating it from the backends present seems like a better way to go. "which backend is responsible for handling this model" for example is what this function already returns: https://github.com/clp-research/clembench/blob/8d823fa2221d38001b1819828893e26b659b607e/backends/__init__.py#L108-L116

davidschlangen commented 10 months ago

Thanks for the comments! @sherzod-hakimov , @phisad , anything to add?

davidschlangen commented 8 months ago

Some quick notes on my current thinking on this:

Comments, @phisad , @Gnurro ?

davidschlangen commented 8 months ago

One problem that this does not address is that a situation might arise in which we want to be able to access the same model via different routes; e.g., llama-chat-70b both via local huggingface inference, if we're on an A100, or via a hosted inference, if we're not.

But maybe a simple solution to that could be that we make lookup possible via that as well? We could (optionally?) pass to lookup_via_model_name() a json object that determines what is loaded. That is, this object might specify backend as well, instead of just model_name; which forces the lookup to return that backend (even if just going by model name would have turned up a different backend).

davidschlangen commented 8 months ago

Adding to above: Maybe the backend doesn't even need access to the whole registry; it just needs to be passed the record that the lookup method used to identify the backend

It seems to me at the moment that this (that is, introducing the registry) would require only relatively small changes to __init.py__. But it would require quite a lot of editing in the backends, because all that information that is now stored in constants in the backends (supported models) needs to be converted into records in a registry. (This mostly affects the huggingface backend, because that supports a very large number of models compared to all others.)

davidschlangen commented 8 months ago

Just for the record (@Gnurro and I talked about this in the train): I think the most elegant way of implementing this would be to do the lookup of the record in the registry via unification with the structure that is provided in the request (in the call to lookup_model_by_name()). That is, out of the list of dictionaries, the first is returned that does not contradict the information in the model description.

If the model description is {"model_name": "llama-chat-70b"}, this will be the first dictionary in the registry where that is the model name; maybe that's something like {"model_name": "llama-chat-70b", "backend": "hf_local"}. If the description is {"model_name": "llama-chat-70b", "backend": "fastchat"}, however, that record (the one with hf_local) would contradict this specification and some other one must be returned.

This solves the problem of potentially having more than one entry for the same model_name. If only the name is given, we still have a defined behaviour (the entry is returned that comes first in the list), but it is possible to be more precise if that is desired.

If we make it so that the lookup (from the registry) never fails, this would even make it possible to specify a model that doesn't have an entry. (What is returned would simply be the record from the request.) E.g, the lookup structure could be something like this: {"model_name": "llama-chat-70b", "backend": "fastchat", "url": "https:localhost:8080"}, and if there's nothing that doesn't contradict this in the registry (because maybe we do have an entry for name and backend, but it points to our server, and hence contradicts the url field), returning the query record would still provide enough information so that the backend can be instantiated.

This would make it possible to play around with models, access modes, and model variations (e.g., quantisation) that we may not want to officially add to the benchmark (i.e., do not want to be included in runs), and for which we do not want to make an entry in a shared version controlled file (the registry), but that could be useful when playing around with stuff. (This is actually where this is coming from: I see a use in being able to use a quantized model that is being served locally by llama.cpp on my Mac, via its openAI-compatible server, but I wouldn't necessarily want to have to create an entry for that in the registry.)

phisad commented 8 months ago

So as far as I understand, there are two issues here:

(a) clem always loads all available backends (which consumes time and might irritate user experience e.g. loading huggingface w.o. having the requirements installed) (b) users want potentially add a backend for testing purposes without sharing it publicly

However, we can make the assumption that the user knowns what backend he want to use when running a model.

A simple solution for this might be to let the user specify the backend at invocation time (we only needed the "all" backends loading mechanism because we were sure that we wanted to test all of them for the first benchmark runs). For example openllm allows to specify the model, a specific model version and a compute backend (here pt=pytorch):

openllm start llama --model-id meta-llama/Llama-2-7b-chat-hf --use-llama2-prompt --api-workers 8 --backend pt --quantize int4 --device 0

First the user might want to check which model-ids are available:

openllm models  -> List all supported models.

We could allow something similar:

python3 scripts/cli.py models

Which would go through all backends and list them plus their supported models. Here the user can already see and choose if the backend should be hf_local/llama or fastchat/llama. The backend will become part of the model string.

Then we allow the user to perform a run as follows:

python3 scripts/cli.py -m hf_local/llama--fastchat/llama run taboo

which would pair both llama models, but from different backends. Clem will know which backend to call via the first part of the model-id. The backend string should be the same as the file under backends/_api.py. If users want to test their own models they could simply put their file there and call them alike with local/llama (having a file local_api.py) or philipp/llama (having a file philipp_api.py).

I will have a look if this works on a feature branch. We can later expand on this approach by adding more options to the cli call.

davidschlangen commented 8 months ago

Yeah, I thought about overloading the model name as well as one possible option. But that addresses only some of the problems, and in a less flexible way, I think.

But first, let me list the problems that I think should be addressed, not all of which you list above:

A) All backends are always instantiated when importing backend. That leads to errors (though not fatal ones) if requirements are not met on the local installation. B) The backends, especially those that handle a large number of models, contain a lot of information that is not code. They could be much more elegant (= maintainable, extendable, etc.) if that information were kept elsewhere. C) We may want variable degrees of control over which model is being addressed, from just going via name to specifying the backend, to calling variants (quantized versions, different chat template?, whatever).

Among those, A is the smallest problem. Using an URL-type solution to C, as you propose, requires us to specify this format now (because it needs to be parsed), and may exclude use cases that we're not thinking about now. Calling via a structured object (a json record) basically is like calling via an URL, except that it comes pre-parsed, and it's between the caller and the backend to specify something that will have an effect, the middleman doesn't need to know anything. The URL/name solution would leave B unaddressed.

I also have a use case in mind that you may not be aware of, which is to use the backend mechanism as an abstraction layer when prototyping games (or in general, when programming with LLMs), as shown in my howto_prototype_games.ipynb. There it makes sense to easily be able to access different backends programmatically, and it's not a problem to pass a JSON object when loading the backend.

davidschlangen commented 8 months ago

That said, I think the approaches are compatible / complementary. We can define a URL-type format as a shorthand option for denoting a model and delivery method (e.g., localhost:8080::openAIcompatible::llama-2-7b-chat), and still have a registry where further information is looked up (by parsing that into {"url": "localhost:8080", "backend": "openAIcompatible", "model_name": "llama-2-7b-chat"}).

(Though good luck formulating a clean easy grammar for such a scheme that would make all things apart from the model name optional.)

Gnurro commented 8 months ago

I have to look into most points brought up more thoroughly before I can properly comment on them, but the current way of using a -- separated string that needs to be parsed has been a constant error risk when running the benchmark. Model names contain many dashes, and working with this string format in terminal urges me to always triple-check my shell scripts. I'd much prefer it if the arguments to cli.py and the like would be more explicit, passed via proper cli arguments like run --game privateshared --model OpenChat3.5 --model_2 OpenChat3.5 instead of run privateshared OpenChat3.5--OpenChat3.5. This is of course not directly related to a model registry, but since establishing one would go along with refactoring the model selection it might be good to make this more convenient as well.

phisad commented 8 months ago

I have to look into most points brought up more thoroughly before I can properly comment on them, but the current way of using a -- separated string that needs to be parsed has been a constant error risk when running the benchmark. Model names contain many dashes, and working with this string format in terminal urges me to always triple-check my shell scripts. I'd much prefer it if the arguments to cli.py and the like would be more explicit, passed via proper cli arguments like run --game privateshared --model OpenChat3.5 --model_2 OpenChat3.5 instead of run privateshared OpenChat3.5--OpenChat3.5. This is of course not directly related to a model registry, but since establishing one would go along with refactoring the model selection it might be good to make this more convenient as well.

That's indeed cumbersome and it might be simpler and more straightforward to assume a self-play modelA-modelA mode per default and take the modelA-modelB combination as a rarer side-case that should be somehow handled differently (like forcing or overwriting a second model)?

davidschlangen commented 8 months ago

That sounds good. We can assume by default that it's the same model, and only if model_B is specified, this is a different one.

And we can assume that MODEL in --model MODEL is either a string, in which case it is the model name, or it is a JSON that may specify whatever else we want to specify on the command line (as per the above).

davidschlangen commented 8 months ago

@Gnurro , do you want to try your hand on this? If I remember correctly, your proposal on the slides (which aren't on mattermost?) was going in this direction already anyways.

Gnurro commented 8 months ago

I've started on setting things up for the HF backend here: https://github.com/Gnurro/clembench/tree/hf_backend_refactor And the JSON I've shown is this one: https://github.com/Gnurro/clembench/blob/hf_backend_refactor/backends/hf_local_models.json Do you want me to work on this from the (HF) backend side, or go into the CLI part of the framework as well? I might take a bit to really get familiar with the CLI/running calls part of the framework code, but I wanted to do that sometime. We will have to decide if I work on this or on multimodal model support as well if the latter is the case.

phisad commented 8 months ago

Btw. another (but similar) approach for a model registry is taken here: https://github.com/lm-sys/FastChat/blob/main/fastchat/model/model_registry.py

davidschlangen commented 8 months ago

Btw. another (but similar) approach for a model registry is taken here: https://github.com/lm-sys/FastChat/blob/main/fastchat/model/model_registry.py

Yup, but they again make the assumption that the semantics of the structure is fixed. We can do better!

davidschlangen commented 8 months ago

Since @phisad asked for an example, it could look something like this:

[
    {
    "model_name": "gpt-4-preview",
    "backend": "openai"
    },
    {
    "model_name": "SUS-Chat-34B",
    "backend": "hf_local",
    ...
    },
    {
    "model_name": "vicuna-7b-v1.5",
    "backend": "open_ai_compat",
    "base_url": "https://OUR-POSTDAM-FASTCHAT-URL-REMOVED"
    },
    {
    "model_name": "lh-vicuna-7b-v1.5",
    "call_name": "vicuna-7b-v1.5",
    "backend": "open_ai_compat",
    "base_url": "http://127.0.0.1:8000/v1"
    },
    ...
]

Where the only condition we put on a record is that it minimally contains model_name and backend (because that's what lookup_model_by_name() needs to do it job. There can be other entries, which may be meaningful to the backend itself. (See attached image with @Gnurro 's proposal, which however was missing the backend field.)

[Except that ouf course I immediately had to make things more complicated.. We could make a distinction between the name that is used to identify the model (model_name) in the lookup, and the name that the backend uses (e.g., in the entry for lh-vicuna-7b-v1.5, in a setting where we may have a vicuna that is served by our server, but also may want to quickly specify a version where the call goes inside the house.]

Does that make sense, @phisad ?

Bildschirmfoto 2024-01-11 um 17 19 32
davidschlangen commented 8 months ago

I've started on setting things up for the HF backend here: https://github.com/Gnurro/clembench/tree/hf_backend_refactor And the JSON I've shown is this one: https://github.com/Gnurro/clembench/blob/hf_backend_refactor/backends/hf_local_models.json Do you want me to work on this from the (HF) backend side, or go into the CLI part of the framework as well? I might take a bit to really get familiar with the CLI/running calls part of the framework code, but I wanted to do that sometime. We will have to decide if I work on this or on multimodal model support as well if the latter is the case.

Ah, I hadn't seen this, you've already started.

The way I'm thinking about this, this should be transparent from the cli side. But it's also not something that only concerns one backend. What I would do is to change lookup_model_by_name() (and the whole __init__.py).

(We can make the registry lookup (where the function may get a JSON record itself) more complicated (unification!) later.)

Please coordinate with @phisad , who was also interested in this issue.

Gnurro commented 8 months ago

Ah, I hadn't seen this, you've already started.

I've started doing this for HF backend improvement alone - the question is if I should start a level above that and set this up for all backends. Overhauling the CLI and calls is another step from that. I have a meeting with @sherzod-hakimov tomorrow, and will talk about moving from setting up multimodal backend(s) to doing this, as I can't do both that and the model loading/calling overhaul at once.

davidschlangen commented 8 months ago

Not sure I’m following. In any case, the answer to

Do you want me to work on this from the (HF) backend side, or go into the CLI part of the framework as well?

is “neither”. This is larger than the HF backend (as it requires a complete overhaul of how backends are loaded), but it is also independent of the CLI (as the interface from the outside, via lookup_by_model_name(), remains unchanged).

I would also recommend using two separate branches for the different features, this one and the multimodal model interface.

Gnurro commented 8 months ago

I would also recommend using two separate branches for the different features, this one and the multimodal model interface.

Sounds good. I'll work on the model registry and loading backends/models based on it, then, as @phisad is more acquainted with the CLI/run calls code (see https://github.com/clp-research/clembench/issues/34 ).

phisad commented 8 months ago

Since @phisad asked for an example, it could look something like this:


[
    {
  "model_name": "gpt-4-preview",
  "backend": "openai"
    },
...
Does that make sense, @phisad ?

Just be clear on this: where exactly does this file live?

And an entry could theoretically also be given as an option to the run command? Meaning that we provide a base registry (in clem, as a template?) and users could either expand that file in the project (key.json would be a sibling) or point to an arbitrary file on the system?

davidschlangen commented 8 months ago

I would make this a file that lives in backends. Officially added models get an entry there (and this change is checked in an available for all). Ad hoc experiments with other models or variants of models can be done by passing a record on the command line / when requesting a backend, without having to enter that into the official registry file.

(If you scroll up, you see that as a computational linguist, I cannot not see this as unification of feature structures. If you specify {"model_name": "horst"}, the first entry in the registry that matches will be used. If you specify {"model_name": "horst", "backend": "hf_local"}, the first entry that matches this will be used, which may not be the same as for the less specific call. If you specify something that does not match any entry (but does specify a backend), then that record will be passed to that backend.)

davidschlangen commented 8 months ago

My hope actually would be that the first code point that needs to react to this change is lookup_model_by_name(). All other scripts just keep on passing a string, except that now this string is not just a model name, it's a JSON record. (I haven't looked at the code of run.py, however, so I don't know whether this currently does more with the model name.)

... Ah, no, this can't quite be the case. Something somewhere has to create the output directory name. We wouldn't want that to be a JSON string.. 😁

Gnurro commented 8 months ago

I have a prototpye for the more general system in my WIP refactor of the huggingface backend: https://github.com/Gnurro/clembench/blob/f01a305beba75a4c898fa87952164d83f059a786/backends/huggingface_local_api.py#L40-L45 Model-specific settings are loaded from the model registry JSON (which only covers huggingface for now) and used as model_settings attribute.

phisad commented 8 months ago

I have a prototpye for the more general system in my WIP refactor of the huggingface backend: https://github.com/Gnurro/clembench/blob/f01a305beba75a4c898fa87952164d83f059a786/backends/huggingface_local_api.py#L40-L45 Model-specific settings are loaded from the model registry JSON (which only covers huggingface for now) and used as model_settings attribute.

I think that is not yet how we want to handle this in the end (but rather make the distribution part already in lookup_model() and e.g. the actual backend does not check again that it is the one to be addressed). I will also look into this and make a suggestion on a branch (regarding all backends). Maybe also trying to rather provide a *-model-registry.json for each backend.

davidschlangen commented 8 months ago

This looks like a great start! I like how it has simplified the backend code quite a lot.

The next step now is to lift this one layer of abstraction higher. As @phisad notes, the reading in of this file should not happen at the individual backend, but rather should be done by the general lookup, and the json object then be passed to the backend that it calls. (Meaning also that the SUPPORTED_MODELS mechanism in the backend is obsolete.) [This is just to be explicit; I think you are aware of this and it made sense to first test the general format only within the huggingface backend.]

So, please coordinate, @phisad and @Gnurro , on who does which next steps (and try to avoid doing double work).

One PS to @phisad : I think there should be a single model_registry.json file for all models that clemgame is supposed to know, not separate ones for each backend. (That would kind of defeat the purpose of first doing the lookup via model_name (plus potentially more features), and through that finding out which backend is responsible.)

Gnurro commented 8 months ago

I'm establishing the groundwork for the general model registry while adding the long-planned enhancements of the huggingface backend - I will move on to making the general registry and loading framework for all backends after I'm done with these enhancements. See age of https://github.com/clp-research/clembench/issues/28 and https://github.com/clp-research/clembench/issues/22 .

davidschlangen commented 8 months ago

Alright. So then @phisad , if you have the time at the moment, you could give it a try? (Potentially doing some GitHub magic that gives you access to the code in JJ’s branch.)

phisad commented 8 months ago

I started with https://github.com/clp-research/clembench/issues/34 because this is a prerequisite to provide dicts/json on cli invocation (and not "--" strings)

phisad commented 8 months ago

OK merged the prerequisites. Now I implement this one.

phisad commented 8 months ago

Just to continue here on "compatible" ModelSpec's

I'd quite like to see a mechanism where the first compatible spec from the registry is returned. Meaning that you can have several entries for the same model name (which of course would differ in some other aspects, maybe the responsible backend, or some other parameters).

I think this term is never properly defined before and it would be easy to include that (and having a list instead of a dict as the model registry). What would be a compatible spec if not defined by the model_name? Like having multiple specs with the same model_name, what would be a compatible one selected? For backwards compatibility (and to reduce confusion) I simply went with how it works right now.

Maybe what is meant here is what I already started: if a ModelSpec is given via the command line that has no backend defined and an entry with the model_name also exists in the model-registry, then the information from the model-registry will be attached to the given spec (automatically delegate to that backend). But will not overwrite any information given by the user-provided ModelSpec.

We can make this Matcher arbitrarily complex e.g. make decision based on information overlap: Then that information in the model-registry is attached to the given ModelSpec that has the most overlap with the given ModelSpec.

That is, out of the list of dictionaries, the first is returned that does not contradict the information in the model description.

OK, I think this is easy to extend. We basically always scan through all model-registry entries and attach the information given the first one that matches entirely the user-provided ModelSpec. If none matches, then we also do not attach any information and simply pass on the user-provided ModelSpec.

phisad commented 8 months ago

I adjusted the lookup and it would now be as follows:

Given this registry

[
  {
    "model_name": "model1",
    "backend": "huggingface_local",
    "model_id": "model_id1",
    "custom_chat_template": "custom_chat_template",
    "eos_to_cull": "<|end_of_turn|>"
  }, {
  "model_name": "model2",
  "backend": "huggingface_local",
  "model_id": "model_id2",
  "custom_chat_template": "custom_chat_template",
  "eos_to_cull": "<|end_of_turn|>"
}, {
  "model_name": "model1",
  "backend": "openai",
  "model_id": "model_id3",
  "custom_chat_template": "custom_chat_template",
  "eos_to_cull": "<|end_of_turn|>"
}
]

And the following ModelSpecs (queries for the keys), the tests run through.

class BackendTestCase(unittest.TestCase):
    def test_get_backend_for_model1(self):
        init_model_registry("test-registry.json")
        backend = get_backend_for(ModelSpec(model_name="model1"))
        assert backend.model_spec.backend == "huggingface_local"

    def test_get_backend_for_model2(self):
        init_model_registry("test-registry.json")
        backend = get_backend_for(ModelSpec(model_name="model2"))
        assert backend.model_spec.backend == "huggingface_local"

    def test_get_backend_for_model1_other(self):
        init_model_registry("test-registry.json")
        backend = get_backend_for(ModelSpec(model_name="model1", backend="openai"))
        assert backend.model_spec.backend == "openai"

The updated code looks like:

def get_backend_for(model_spec: ModelSpec) -> Backend:
    """
    :param model_spec: the model spec for which a supporting backend has to be found
    :return: the backend registered that supports the model
    """
    if model_spec.is_human():
        return HumanBackend(model_spec)
    if model_spec.is_programmatic():
        return ProgrammaticBackend(model_spec)
    for registered_spec in _model_registry:
        if model_spec.matches(registered_spec):
            model_spec.update(registered_spec)
    if not model_spec.has_backend():
        raise ValueError(f"Model spec requires backend, but no entry in model registry "
                         f"for model_name='{model_spec.model_name}'. "
                         f"Check or update the backends/model_registry.json and try again. "
                         f"A minimal entry is {{'model_name':<name>,'backend':<backend>}}.")
    backend = _load_backend_for(model_spec)
    return backend

We can make matches as we like. Here I use model_name and backend. Do we want to match anything? Might that be confusing?

Gnurro commented 8 months ago

This is how I set that up in my HF backend refactor: https://github.com/Gnurro/clembench/blob/d6ce06c4124e44c6a41ea969995c8c31b3d139a9/backends/huggingface_local_api.py#L59-L64 I think the version @phisad posted above is more robust, though.

Do we want to match anything? Might that be confusing?

I think as long as there is the simple fallback of getting the fitting backend if just the model name is given, and further specification remains optional, this isn't adding any confusion. The option to pass more specifics to select a different backend or with different backend-specific settings just makes things more flexible. Of course this means that the default backend for a specific model name is more or less hardcoded by the order of entries in the model registry (file), but afaik that is exactly what @davidschlangen proposed. With the current commits, things are more confusing/complicated, as the model registry is missing entries for the already supported models, forcing users to fully specify backend internals, for example to use the openai api backend...

Gnurro commented 8 months ago

Going back to the initial topic of this issue, the solution seems unclear from the commits so far. See https://github.com/clp-research/clembench/commit/1733a78f4ff6b33f9e84e85f7f1091fe69611d93#r137492766 . Do we want to divorce model names from internal model identifiers, or not? I'm unsure of that intent after the sprawling discussion of multiple different framework aspects in this issue thread. Do we want to have the backend framework use more generic names for creation of backend instances, as in giving it openchat3.5 and it then selecting a default backend for an openchat3.5, which might be the fastchat API openai_compatible backend (which internally uses a different model identifier, fsc-openchat-3.5-0106)? Or should this remain more concise, with the names given to the backend framework reflecting the internal names as it was done so far? Do note that this is different and only arbitrarily connected to what the clembench 'frontend' of cli scripts and the like do - which brings up the question of how parallel/connected/concise this connection should be as well. Model-specific settings that are stored external to specific backend code is another matter that got mixed in here, while not necessary connected - that's why my work on https://github.com/Gnurro/clembench/blob/hf_backend_refactor/backends/huggingface_local_api.py overlaps as well, while solving other long-standing issues with the HF backend (we need to prioritize either the HF refactor or this backend framework refactor to not run into complex merge issues).

phisad commented 8 months ago

"With the current commits, things are more confusing/complicated, as the model registry is missing entries for the already supported models, forcing users to fully specify backend internals, for example to use the openai api backend..."

Sorry, if this is confused. The code is not complete and only shows an integration of this issue using the model registry I saw in your branch of work. Of course there is more work to do and more iterations are expected. Though the solution so far is - I guess - a good basis to discuss this issue further.

davidschlangen commented 8 months ago

Yes, as far as I can see, this looks great, @phisad , and should give us the flexibility we need!

I didn't go through all of the code, so to be sure, I'll repeat what I think should be achievable:

I think I might have been the one guilty of creating the confusion with model_id and model_name. I agree that this shouldn't be the same as what the hugging face backend does with hugging_face_id -- this should be left to that backend and ignored by the framework. What I was trying to achieve with the call_name thing above is to have a way to overload the model_name to let it be used to look up a specific entry, which however may be for a model that also has a different entry.

Let's assume we have and entry for goat70B that points to our fastchat installation. But now I also want to play around with a goat70B that is served locally on my machine. Of course I could pass a full(er) model spec that ensures that the goat70B entry in the registry doesn't match. But maybe we can find a way of making it possible to have an entry in the registry that would have the name lf-goat70B (so that goat70B doesn't match). But the backend (specified by the entry for lf-goat70B) would still need to know that in reality, the intended model indeed is called goat70B. That's why I had call_name above.

But maybe that's something for the backend to do, and the framework doesn't need to know about this? It would only concern the openai compatible backend anyway. This backend could then check whether a call_name (or maybe real_name) attribute exists, and if so, use this to set the model name.

(Connected to this, maybe one thing that should be handled on this level is to allow optionally for a model_registry_local.json file which would not be under version control and can be used to have model specs for local testing. This file would be consulted after the official model_registry.json, if it exists.)

Misc comments:

davidschlangen commented 8 months ago

Again, as a computational linguist, I can't not comment that what I'm describing here simply is unification of feature structures. We have a file that specifies a list of feature structures, and we are given a feature structure; the output is the first feature structure that unifies with the given one. This may be one that has been specified in the file (but contains more information than the one that was given), or it may be identical to the feature structure that was provided (in case none of the ones in the file unified).

All that the backend lookup cares about is that the process yields something that specifies a backend. Anything else from then on (that is, the meaning of the features) is the business of the backend. (Which is also free to ignore some of the features; making it possible for us to put comments in the ModelSpec, or info that is for others to consume, like a script that automatically compiles out the list of models tested by the benchmark.)

phisad commented 8 months ago

All that the backend lookup cares about is that the process yields something that specifies a backend.

Good. I think this is what it now does. Checking for Human, Programmatic, then Lookup and finally if a backend is defined.

re. temperature: Maybe we can make it a property of the BackEnd object, in the sense that once you have instantiated such an object, it is fixed to a certain temperature setting.

For now it is implemented like this. But the question would be how to inform the BackEnd about the temperature? I used the ModelSpec for this. Then set the temperature once in the Backend. Note that the Backends now carry the ModelSpec with them to have access to the feature structure. Another option would be to have the temperature always just defined in the model-registry.

In my view ModelSpec is serving here two purposes:

Now, if the lookup fails, then per default the structure for A is an identity projection to B. This might be also a point for confusion. We could make this clearer by introducing another class. The code only has a little hint to that as I use for comparison the term "model entry".

Regarding naming I might add that my perception of Backend is something like a ModelProvider. But from the perspective of the framework user (the game implementor) this might be confusing. They use the ModelProvider as it would be the Model itself (they call generate() on it). Thus a backend is actual just something like a skeleton for a (remote procedure call) Model interface. We could also make this more clear in the code and say that the lookup returns a Model (interface) instead of a Backend (while the ModelSpec is only the selector for this).

the framework makes as little assumptions about what info the backend needs as possible

I will give this a try. Note that this will also make something possible like {"backend":"openai"} returns the first model that matches that backend or {"description":"bla"} returns the first model that matches that description. If only a string is given by the user we will still assume that it is model_name.

davidschlangen commented 8 months ago

I guess this (renaming what the lookup returns to something like Model, so that you then say something like:

model = lookup('vicuna7b')
response = model.generate(CONTEXT)

) would require changes in many places? But I agree, conceptually, this makes sense (and makes things clearer).

Let's think about the temperature question a little bit more. Arguments for setting this in the registry would be:

Arguments against doing this:

Ok, with this written out, I'm still tending towards treating temperature (maybe together with MAX_TOKENS?) as a parameter of the current run, not to be specified in the registry. (That's not to say that it can't live in the ModelSpec that is given to the backend when it is instantiated; as long as it can be accessed from the outside.) Even for our main usecase, we might want at some point run experiments at different temperature settings, and we'd certainly not want to have to create additional registry entries for that.

phisad commented 8 months ago

I agree, conceptually, this makes sense (and makes things clearer).

Alright. The number of changes does not exceed what we have now.

I'm still tending towards treating temperature (maybe together with MAX_TOKENS?) as a parameter of the current run, not to be specified in the registry

I agree, there are attributes that specify the model function ("llama","7b","4bit") and there are attributes that specify it's decoding behavior ("temperature", "max_tokens"). We could allow to define both in a model spec and then also allow the decoding options to be passed to the generate method (if wanted) which would overwrite the default defined by the spec (or simply let the backend decide what to do with the passed args).

phisad commented 8 months ago

Making the least assumptions ModelSpec is now a SimpleNamespace that can be created directly, from a string or from a dict. The prototype notebook would look like this:

from backends import ModelSpec, Model, get_model_for
#%%
THIS_MODEL = 'gpt-3.5-turbo-1106' # fails if no registered entry
#%%
THIS_MODEL = 'gpt-4-1106-preview' # fails if no registered entry
#%%
THIS_MODEL = dict(model_name="gpt-3.5-turbo-1106", backend="openai") # works without registered entry when openai_api.py backend is available
#%%
THIS_MODEL = ModelSpec(model_name="gpt-3.5-turbo-1106", backend="openai", temperature=0.0)
#%%
lmm: Model = get_model_for(THIS_MODEL)

And the matching strategies uses fully_contains as documented in test_model_spec.py:

class ModelSpecTestCase(unittest.TestCase):
    def test_empty_fully_contains_empty_is_true(self):
        a = ModelSpec()
        b = ModelSpec()
        self.assertTrue(b.fully_contains(a))

    def test_b_fully_contains_empty_is_true(self):
        a = ModelSpec()
        b = ModelSpec(model_name="model_b")
        self.assertTrue(b.fully_contains(a))

    def test_b_fully_contains_a_with_different_attr_is_false(self):
        a = ModelSpec(model_name="model_a")
        b = ModelSpec(model_name="model_b")
        self.assertFalse(b.fully_contains(a))

    def test_b_fully_contains_a_with_partially_different_attr_is_false(self):
        a = ModelSpec(model_name="model_a", backend="backend_a")
        b = ModelSpec(model_name="model_a", backend="backend_b")
        self.assertFalse(b.fully_contains(a))

    def test_b_fully_contains_a_with_partially_matching_attr_is_true(self):
        a = ModelSpec(model_name="model_a")
        b = ModelSpec(model_name="model_a", backend="backend_b")
        self.assertTrue(b.fully_contains(a))
phisad commented 8 months ago

Adjusted the backend lookup to use nltk's unify method (which I guess is properly tested and implemented; and actually also supports the use of variables). Now the last test case is possible where query specifies more than entry:

    def test_empty_query_unifies_with_empty_to_empty(self):
        query = ModelSpec()
        entry = ModelSpec()
        self.assertEqual(query.unify(entry), ModelSpec())

    def test_empty_query_unifies_with_entry_to_entry(self):
        query = ModelSpec()  # matches everything
        entry = ModelSpec(model_name="model_b")
        self.assertEqual(query.unify(entry), entry)

    def test_different_query_unifies_with_entry_fails(self):
        query = ModelSpec(model_name="model_a")
        entry = ModelSpec(model_name="model_b")
        with self.assertRaises(ValueError):
            query.unify(entry)

    def test_partial_query_unifies_with_entry_fails(self):
        query = ModelSpec(model_name="model_a", backend="backend_a")
        entry = ModelSpec(model_name="model_a", backend="backend_b")
        with self.assertRaises(ValueError):
            query.unify(entry)

    def test_query_unifies_with_entry_to_entry(self):
        query = ModelSpec(model_name="model_a")
        entry = ModelSpec(model_name="model_a", backend="backend_b")
        self.assertEqual(query.unify(entry), entry)

    def test_query_a_unifies_with_entry_to_union(self):
        query = ModelSpec(model_name="model_a", quantization="8bit")
        entry = ModelSpec(model_name="model_a", backend="backend_b")
        self.assertEqual(query.unify(entry), ModelSpec(model_name="model_a", backend="backend_b", quantization="8bit"))