Open katie-lamb opened 3 weeks ago
Hey @zschira , do you know if there's a way to log the size of the training data set (number of docs) in MLFlow for each LayoutLM fine tuning run? Since near term improvements will likely be attributed to increasing the size of the labeled training set, it would be a good variable to log. I saw that inside the log_model
util function we log mlflow.transformers.log_model( model, artifact_path="layoutlm_extractor", task="token-classification" )
, so there's not a ton of customization in logging during a training run. Maybe the thing to do is just log the training set size as a parameter before this log_model
call?
Now that we have a validation framework for Ex. 21 extraction, try these simple improvements and re-evaluate performance.