Closed mridulgarg11 closed 1 year ago
I will look into this today
This is happening because the train_dataset
passed to the DataLoader (learner.py L241) has size zero.
This happens because the training set indata_module
in run_training.py is seen as a Subset of size zero.
I think there is something to be changed in how the dataset is generated.
I am attaching a screenshot from my debugger hoping this can help. If it doesn't we can catch-up offline and we will try to provide an example for that function.
nit: Not sure what is your intent with this script but we use the ClassIncrementalScenario in our examples because it is useful to simulate updates with different data. This is not what you want to do when training a real model. For that you can just drop the scenario.
Thanks for your reply, I was able to resolve the error and you're right it was about how the dataset was being generated. I ended up performing the transformations outside of Renate, it would still be helpful if you could share any example of how the text transforms can be applied within Renate.
In regards to your second note, I was using this script to simulate the model updates with different data. I have two goals here- 1)benchmark different CL algorithms against the fine-tuning method on my own dataset, 2) once I find the best results I would setup a model retraining pipeline. My understanding is that for the 1) point, I would use the ClassIncrementalScenario for simulating CL in offline setting. For the 2) point, I wouldn't need to use any Scenario and just pass in the new dataset. Let me know if that sounds correct.
It's perfectly fine to use the scenario to simulate different situations, I brought it up just FYI. Given your explanation, it seems reasonable to use it as a starting point but you will probably need to tune a few more things (e.g., hyperparameters) depending on your data/problem.
This issue is stale because it has been open for 14 days with no activity. Resume the discussion in the next week or it will be closed automatically.
Hi authors, is there any example of a config file for nlp datasets, especially around applying transformations? I’m trying to create a minimal example. Here's the config file I'm using-
And the training job is below-
This is the stack trace of the error I'm getting-