External package integration using plugins

lorenzomammana commented 3 months ago

Repost of #102

At the current stage there's a somehow simple way to include new tasks using the --include_external flag, yet there's no way to include external models except from cloning the lmms-eval repository and doing modifications on it, which might not be the ideal scenario in many cases (for example when developing new models internally).

In my scenario I would like to work on an external package (let's say it's an lmms-eval plugin) without the necessity of cloning this repository (using it as a package)

Right now I've done a fork of the repository which allows loading external tasks and models using an environment variable called LMMS_EVAL_PLUGINS to include external repositories (like LMMS_EVAL_PLUGINS=package1,package2), particularly in the __main__.py I'm able to integrate new tasks in this way

if os.environ.get("LMMS_EVAL_PLUGINS", None):
    for plugin in os.environ["LMMS_EVAL_PLUGINS"].split(","):
        package_tasks_location = importlib.util.find_spec(f"{plugin}.tasks").submodule_search_locations[0]
        eval_logger.info(f"Including path: {args.include_path}")
        include_path(package_tasks_location)

Assuming that there's a package installed which follows the same structure as lmms-eval, this piece of code retrieves all the content of {package_name}.tasks and includes them as tasks.

In a similar way I've modified the __init__.py of models

if os.environ.get("LMMS_EVAL_PLUGINS", None):
    # Allow specifying other packages to import models from
    for plugin in os.environ["LMMS_EVAL_PLUGINS"].split(","):
        m = importlib.import_module(f"{plugin}.models")
        for model_name, model_class in getattr(m, "AVAILABLE_MODELS").items():
            try:
                exec(f"from {plugin}.models.{model_name} import {model_class}")
            except ImportError:
                pass

In a similar way assuming that there's an external package that follows the same structure as lmms-eval, this piece of code will read the AVAILABLE_MODELS dict from the external package and register new models to be used by lmms-eval.

I've tested this with an external package named lvlm_benchmarks where I'm trying a few models that are not included in this repo (I could contribute with them as well but they are just ugly implementations 😄)

immagine

The __init__.py of models contains simply

AVAILABLE_MODELS = {
    "cogvlm": "CogVLM",
    "textmonkey": "TextMonkey",
    "cogvlm2": "CogVLM2",
}

kcz358 commented 3 months ago

Thank you for your contribution, I will review the code recently

lorenzomammana commented 3 months ago

Mmm ok, I see that my design is currently disaligned with the current implementation, I can update the PR with your suggestions.

I still really don't like the fact that it's required to specify extra arguments to the CLI every time a user wants to include extra tasks or models, maybe both ways could be included? For the CLI I see as advantage not requiring building a package around tasks and model

kcz358 commented 3 months ago

From my point of view including models and tasks in command args would be a more explicit way and align with current code style better. I believe would be okay to keep both as current plugin implementation won't affect the pipeline.

@Luodian , Do you mind check this PR also? Which code style do you prefer?

Luodian commented 3 months ago

From my point of view including models and tasks in command args would be a more explicit way and align with current code style better. I believe would be okay to keep both as current plugin implementation won't affect the pipeline.

@Luodian , Do you mind check this PR also? Which code style do you prefer?

Thanks for this PR! I take into the look and feel good to merge this.

lorenzomammana commented 3 months ago

Nice! I was working exactly right now, I've included a parameter also for the CLI, still believe that's not a good option 😄

This is an example of the infinite command 😅

accelerate launch --num_processes=8 -m lmms_eval --model textmonkey --tasks chords  --batch_size 1 \
--log_samples --log_samples_suffix cogvlm_chords --output_path ./logs/ \
--include_path=/teamspace/studios/this_studio/lvlm-plugin/lvlm_benchmark/tasks/chords/ \
--include_model=/teamspace/studios/this_studio/lvlm-plugin/lvlm_benchmark/models/textmonkey.py:TextMonkey`

lorenzomammana commented 3 months ago

I've arrived just 5 minutes too late haha, good I didn't like the last implementation too much 😄

Thanks for your work!

Luodian commented 3 months ago

Take time, you can still send new PR~

EvolvingLMMs-Lab / lmms-eval

External package integration using plugins #126