Curriculum Learning for sequence_length

This PR address issue #39 for strategy == sequence length
We are using group_by_length= True and length column_name of TrainingArguments. We are precomputing the length column of our raw sequence lengths in dataset.py. These arguments use LengthGroupedSampler of hf underneath and we can use this for our other strategies by precomputing our length column (this length column can be perplexity/esm fold values but we will have to discretize them)
A few arguments have been provide in the YAML files named do_curriculum_learning, curriculum_learning_column__name and curriculum_learning_strategy
This is the first strategy supported and @Leo-T-Zang is working on perplexity and esm fold I believe. @Leo-T-Zang please use this PR as a base to incorporate your changes/
added a unit test in test_cl.py to confirm sequences are shorter first using the data collator and a percentage unsorted check (my tests showed 0% unsorted batches but since LengthGroupedSampler has randomness, it may differ from run to run so i have a threshold of 10%

OpenBioML / protein-lm-scaling