Open JanCizmar opened 1 year ago
There's no direct support for this but you can accomplish this by modifying argostrain/train.py
.
I would add input("Downloaded Argos Data")
after the data has been downloaded here and then append your custom data to run/source
and run/target
.
You could also train one base model and then fine tune it using custom data. However, this will also require using custom code.
I want to improve using custom data and fine tuning so if anyone has suggestions or pull requests they're appreciated.
Would incremental training also be possible with the suggestions from libretranslate? I think the base models that are available are quite good already, but having the feedback from libretranslate incorporated might make corner cases even better - this might depend on the actual use case (e.g. a medical use case might need a different fine-tuning than a scuba-diving one, to pick random examples).
Having a possibility to quickly improve the base model without having to use a high-power machine for training the complete model again with 99.9% same input data would be great!
Hi there!
I would like to use the data currently provided in data-index.json, but at the same time, I would like to use my custom data. Can I tell the script to generate a model considering my custom data is more relevant / has a bigger priority?
Let's say I have one large dataset I am using all the time, and then I have multiple smaller datasets which I would like to train different models for each. Is something like an incremental build possible, so I would reuse some previous output and just "append" my custom data to save some training time and resources?
Thanks!