Closed StrangeTcy closed 6 months ago
I am not pretty sure what's your used datasets are? Could you provide more information?
If you are only use LA_DD
and LA_CONV
and LACR_T2T
, finetune on 512x512 resolutions, with 2-3 epochs would take 1-2 hours. The model would then show a sign of life.
As for hosting the model, please use the
endpoint code: https://github.com/Luodian/Otter/blob/main/pipeline/serve/deploy/otterhd_endpoint.py
frontend code: https://huggingface.co/spaces/Otter-AI/OtterHD-Demo/blob/main/app.py
I am not pretty sure what's your used datasets are? Could you provide more information?
If you are only use
LA_DD
andLA_CONV
andLACR_T2T
, finetune on 512x512 resolutions, with 2-3 epochs would take 1-2 hours. The model would then show a sign of life. -- https://github.com/Luodian/Otter/blob/main/shared_scripts/Demo_Data.yaml doesn't mention LACONV, so we didn't use, even though https://entuedu-my.sharepoint.com/personal/libo0013_e_ntu_edu_sg/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Flibo0013%5Fe%5Fntu%5Fedu%5Fsg%2FDocuments%2FMIMICIT%5FParquets&ga=1 hasLACONV_instructions.json
So, we used the finetuning script you suggested (https://github.com/Luodian/Otter/blob/main/docs/OtterHD.md#how-to-finetune), and it could only work with a batch size of 1, which eventually took about 11 hours.
Not sure about signs of life, but we got checkpoints which work ok with inference.py
, but now we want to have a gradio service out of it as well
As for hosting the model, please use the
endpoint code: https://github.com/Luodian/Otter/blob/main/pipeline/serve/deploy/otterhd_endpoint.py
-- that looks great, except it's a flask app, which we'll have trouble with accessing publicly (it would probably run alright on a local machine, but then so would a console inference script)
frontend code: https://huggingface.co/spaces/Otter-AI/OtterHD-Demo/blob/main/app.py
-- that's the one that you use currently, except this one has a really simple definition of http_bot
, and the original gradio_web_server
has a much longer and more complicated one
ETA: actually, we have now tried to run a modified app.py
with a url
that otterhd_endpoint
outputs, it just didn't work.
I'm also not sure the fn
from the vqa_btn.click()
ever actually gets called in our case.
Ok, we've been using gradio==3.23.0
, which was probably a bad idea.
Switching to 4.11.0
made everything work.
Specifically, it became apparent that the model is poorly finetuned on only two sets of instructions.
So, the question now is "how do we go on finetuning it?"; the instruction_following
script seems to accepts fuyu as an arg, not an already finetuned otterhd checkpoint.
ETA: train_args.py
has this thingy:
parser.add_argument(
"--trained_ckpt",
type=str,
help="path to trained_ckpt",
default=None,
)
,
while instruction_following.py
has
if args.trained_ckpt is not None:
train_ckpt = torch.load(args.trained_ckpt, map_location="cpu")
if train_ckpt.get("model_state_dict", None) is not None:
train_ckpt = train_ckpt["model_state_dict"]
_ = model.load_state_dict(train_ckpt, strict=False)
print(_[1])
I'm just not sure it'd work with the checkpoints we already have
Another thing is the dataset itself: let's take CGD as an example. As I understand it, we're supposed to have lines like
IMAGE_TEXT: # Group name should be in [IMAGE_TEXT, TEXT_ONLY, IMAGE_TEXT_IN_CONTEXT]
CGD: # dataset name can be assigned at any name you want
mimicit_path: data_folder/json/CGD/CGD_instructions.json # Path of the instruction json file
images_path: data_folder/Parquets/CGD.parquet # Path of the image parquet file
num_samples: -1 # Number of samples you want to use, -1 means use all samples, if not set, default is -1.
Now if we look at the files on huggingface, https://huggingface.co/datasets/pufanyi/MIMICIT/tree/main/data/CGD, CGD_instructions.json
is present and small, CGD.json
is also present and huge, but isn't used by this setup, and there're 9 parts of CGD.parquet
.
How should we use all that?
Hi @StrangeTcy , how do you get OtterHD checkpoint to finetune it?
Hi @StrangeTcy , how do you get OtterHD checkpoint to finetune it?
I follow the instructions from the OtterHD readme: https://github.com/Luodian/Otter/blob/main/docs/OtterHD.md#how-to-finetune
which gets you a folder with a lot of jsons and one huge pytorch_model.bin
Following our finetuning attempts (https://github.com/Luodian/Otter/issues/320), we now have checkpoints for OtterHD finetuned on 2 datasets from the
LA
part of all possible datasets. Now we have two choices:For that second option we can use different scripts, like
cli.py
andgradio_web_server.py
and possibly others. But perhaps there're scripts you recommend as best for OtterHD specifically?ETA: there's also
inference.py
indemos
, it just requires ayaml
file, and we don't have anyUPD: I've modified
inference.py
a bit and wrote ayaml
file with a single question about a single image (then one from the demo with rows of apples). The model we have now can answer that question, it just answers it wrong. So should we finetune it further?