NorskRegnesentral / skweak

skweak: A software toolkit for weak supervision applied to NLP tasks
MIT License
918 stars 73 forks source link

any intuition of tuning number of epochs and redundancy_factor? #17

Closed JieyuZ2 closed 3 years ago

JieyuZ2 commented 3 years ago

Hi,

could you please provide some insights regarding the range of number of epochs and redundancy_factor for parameter tuning?

Thanks!

plison commented 3 years ago

Regarding the number of epochs, we found in practice that the parameters converged after 3-4 epochs, so it's often unnecessary to run more iterations (but the code prints out the decrease in total loss after each iteration, so it's easy to verify how many epochs are necessary). As for the redundancy factor, it depends on whether you have many labelling functions that are heavily correlated (i.e. when one labelling function builds upon the result of another). If you don't, you can simply ignore this redundancy factor. If you do have such correlated LFs, and their name express this dependence (see the method _get_correlated_sources), the easiest is to fit the HMM with several values for this factor, and see which one works best on some development data.

JieyuZ2 commented 3 years ago

Thank you!

For the number of epochs, does it mean we could use a larger number of epochs (like 10) to ensure it converges? So we don't have to tune for the best number of epochs.

plison commented 3 years ago

Yes, you can surely use 10 epochs, no problem.