微调数据集示例 - Githubissues

facok / florence2-ft-simple

finetune your florence2 model easy

11 stars 2 forks source link

微调数据集示例 #2

Open nibirou opened 1 month ago

nibirou commented 1 month ago

大佬，能在项目中放一个微调使用的数据集的构建示例不，想学习一下训练数据的格式，谢谢！

facok commented 1 month ago

大佬，能在项目中放一个微调使用的数据集的构建示例不，想学习一下训练数据的格式，谢谢！

你只要--images_dir参数填图片目录，--texts_dir参数填txt格式的caption目录，图片和txt使用互相对应的名称，就可以直接进行微调，不需要额外的工作。比如：

python main.py \
    --images_dir ./data/images \
    --texts_dir ./data/texts \
    --model_dir ./Florence-2-large \
    --output_dir ./output \
    --task_type "<MORE_DETAILED_CAPTION>" \
    --batch_size 1 \
    --epochs 3 \
    --learning_rate 1e-6 \
    --accumulation_steps 8

---images
------| a222.jpg
------| b6666.jpg
------| cuuu.jpg
------| ......

---texts 
------| a222.txt
------| b6666.txt
------| cuuu.txt
------| ......

nibirou commented 1 month ago

嗯嗯，就是想看一下txt文件里面的内容，谢谢！

facok commented 1 month ago

Golda_poster

Movie poster, historical drama, elderly woman in a grey coat, standing with hands on a table, intense expression, background features military personnel and maps, dim lighting, serious mood, large gold text "GOLDA" at the bottom.

nibirou commented 1 month ago

谢谢大佬！顺便问下大佬你有试过用florence2来做官方样例里面其他类型指令的微调不，比如Object detection 、Segmentation、OCR、还有Phrase Grounding，这些微调用的数据集是不是也是和, , 指令的微调一样，这些微调的数据集的标注构建格式应该是怎样的，大佬知道不，谢谢！

nibirou commented 1 month ago

https://huggingface.co/microsoft/Florence-2-large/blob/main/sample_inference.ipynb

facok commented 1 month ago

谢谢大佬！顺便问下大佬你有试过用florence2来做官方样例里面其他类型指令的微调不，比如Object detection 、Segmentation、OCR、还有Phrase Grounding，这些微调用的数据集是不是也是和, , 指令的微调一样，这些微调的数据集的标注构建格式应该是怎样的，大佬知道不，谢谢！

数据集格式怎么样，取决于读取和处理数据集的代码，比如本项目，你可以参考https://github.com/facok/florence2-ft-simple/blob/main/dataset.py里的代码，而你说的微调其它指令的数据集格式，你可以参考https://github.com/andimarafioti/florence2-finetuning/blob/main/data.py

Li-Qingyun commented 1 month ago

我用那个inference demo跑出来的结果，因为这个是encoder-decoder的架构，我发现decoder的输出前面的ids一定是2000，好像decoder生成的部分需要用2000来作为prefix监督吧