Can't really use an OpenAI compatible API server for ilab data generate since it adds the path to the model name

Bug description If we want to use the real OpenAI API server with for example the gpt-4o model, currently we can't, since it sends /instructlab/models/gpt-4o through the API instead. This doesn't happen with InstructLab 0.17.1, the bug is in the ilab wrapper of RHELAI.

To Reproduce

export HF_TOKEN=notneeded
ilab init --non-interactive
ilab download --repository ibm/granite-7b-base
mkdir taxonomy/knowledge/netlabs
cat >taxonomy/knowledge/netlabs/qna.yaml <<EOT
version: 2
task_description: 'Test con datos locales de Netlabs'
created_by: ibaldo@netlabs.com.uy
domain: netlabs
seed_examples:
  - question: When did the 2024 Oscars happen?
    answer: |
      The 2024 Oscars were held on March 10, 2024.
  - question: What film had the most Oscar nominations in 2024?
    answer: |
      Oppenheimer had 13 Oscar nominations.
  - question: Who presented the 2024 Oscar for Best Original Screenplay and Best Adapted Screenplay?
    answer: |
      Octavia Spencer presented the award for Best Original Screenplay and Best Adapted Screenplay at the 2024 Oscars.
  - question: Who hosted the 2024 Oscars?
    answer: |
      Jimmy Kimmel hosted the 96th Academy Awards ceremony.
  - question: At the 2024 Oscars, who were the nominees for best director and who won?
    answer: |
      The nominees for director at the 2024 Oscars was Christopher Nolan for Oppenheimer,
      Justine Triet for Anatomy of a Fall, Martin Scorsese for Killers of the Flower Moon,
      Yorgos Lanthimos for Poor Things, and Jonathan Glazer for The Zone of Interest.
      Christopher Nolan won best director for Oppenheimer.
  - question: Did Billie Eilish perform at the 2024 Oscars?
    answer: |
      Yes Billie Eilish performed "What Was I Made For?" from Barbie at the 2024 Oscars.
document:
  repo: https://github.com/juliadenham/oscars2024_knowledge.git
  commit: main
  patterns:
    - oscars2024_results.md
EOT
ilab taxonomy diff
ilab generate --endpoint-url https://api.openai.com:443/v1 --model gpt-4o --api-key secret

Unexpected behavior

ilab generate --endpoint-url https://api.openai.com:443/v1 --model gpt-4o --api-key secret

WARNING: You need at least 2 GPUs to load full precision models
ilab generate --endpoint-url https://api.openai.com:443/v1 --model gpt-4o --api-key secret --model-family mixtral --num-instructions 5000 --model /instructlab/models/gpt-4o
You are using an aliased command, this will be deprecated in a future release. Please consider using `ilab data generate` instead
Generating synthetic data using '/instructlab/models/gpt-4o' model, taxonomy:'taxonomy' against https://api.openai.com:443/v1 server
DEBUG 2024-07-03 21:08:29,384 utils.py:581: read_taxonomy Found new taxonomy files:
DEBUG 2024-07-03 21:08:29,384 utils.py:583: read_taxonomy * knowledge/netlabs/qna.yaml
DEBUG 2024-07-03 21:08:29,673 utils.py:214: get_documents Processing files...
DEBUG 2024-07-03 21:08:29,674 utils.py:539: read_taxonomy_file Content from git repo fetched
Cannot find prompt.txt. Using default prompt depending on model-family.
DEBUG 2024-07-03 21:08:29,675 generate_data.py:417: generate_data Loaded 6 human-written seed instructions from taxonomy
DEBUG 2024-07-03 21:08:29,678 generate_data.py:461: generate_data Generating to: generated/generated_gpt-4o_2024-07-03T21_08_29.json
  0%|                                                                                                                            | 0/5000 [00:00<?, ?it/s]
Synthesizing new instructions. If you aren't satisfied with the generated instructions, interrupt training (Ctrl-C) and try adjusting your YAML files. Adding more examples may help.
INFO 2024-07-03 21:08:29,685 generate_data.py:505: generate_data Selected taxonomy path knowledge->netlabs
Generating dataset failed with the following error: Model /instructlab/models/gpt-4o is not served by the server.
These are the served models
['whisper-1', 'tts-1', 'dall-e-2', 'tts-1-hd-1106', 'tts-1-hd', 'gpt-4-turbo-2024-04-09', 'gpt-4-turbo', 
'gpt-3.5-turbo-1106', 'dall-e-3', 'gpt-4-0125-preview', 'gpt-4-turbo-preview', 'text-embedding-3-small', 
'text-embedding-3-large', 'gpt-3.5-turbo-16k', 'gpt-4-1106-preview', 'babbage-002', 'gpt-4o-2024-05-13', 
'gpt-4', 'gpt-4-0613', 'gpt-3.5-turbo-0125', 'tts-1-1106', 'gpt-3.5-turbo', 'gpt-3.5-turbo-instruct', 
'gpt-3.5-turbo-instruct-0914', 'text-embedding-ada-002', 'davinci-002', 'gpt-4o']
  0%|                                                                                                                            | 0/5000 [00:00<?, ?it/s]

Device Info

ilab sysinfo
WARNING: You need at least 2 GPUs to load full precision models
/usr/local/lib64/python3.11/site-packages/torch/cuda/__init__.py:619: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
instructlab.version: 0.17.1
sys.version: 3.11.7 (main, May 16 2024, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)]
sys.platform: linux
os.name: posix
platform.release: 5.14.0-467.el9.x86_64
platform.machine: x86_64
os-release.ID: rhel
os-release.VERSION_ID: 9.4
os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow)
torch.version: 2.3.1+cu121
torch.backends.cpu.capability: AVX2
torch.version.cuda: 12.1
torch.version.hip: None
torch.cuda.available: True
torch.backends.cuda.is_built: True
torch.backends.mps.is_built: False
torch.backends.mps.is_available: False
torch.cuda.bf16: True
torch.cuda.current: 0
torch.cuda.0.name: NVIDIA A10G
torch.cuda.0.free: 21.7
torch.cuda.0.total: 22.0
torch.cuda.0.capability: 8.6
llama_cpp_python.version: 0.2.75
llama_cpp_python.supports_gpu_offload: True

RedHatOfficial / rhelai-dev-preview

Can't really use an OpenAI compatible API server for ilab data generate since it adds the path to the model name #23