clulab / scala-transformers

Scala interfaces to huggingface transformers and tokenizers
10 stars 0 forks source link

Train the transformers #19

Closed kwalcock closed 1 year ago

kwalcock commented 1 year ago

This is being submitted as a draft to facilitate discussion

MihaiSurdeanu commented 1 year ago

Let's not merge this yet. I need to push a few more things.

kwalcock commented 1 year ago

I was asked to look, and the code here didn't look at all bad, but there isn't much Python code in the clulab repos that shows what it should look like. If I had to be prescriptive about it, I would like exemplary Python code (that might serve as an example, be read by others, be copied) to use type hinting for all functions/methods and to not have more than one global variable or function per file, the one corresponding to the companion object in Scala. This also needs a requirements file (requirements.txt) in which all the dependencies and their versions are specified. I don't know what versions are being used, so this should probably be made on Mihai's computer. I was editing the files to fit the "rules" and see if they work reasonably and what should be added, but before I get too far, they should get written down in case that's not the way to go. The files being updated are in the branch https://github.com/clulab/scala-transformers/tree/kwalcock/training/encoder/src/main/python.

MihaiSurdeanu commented 1 year ago

Thanks @kwalcock !

Thanks!

kwalcock commented 1 year ago

My branch is incomplete and untested, but the Configuration class encapsulates some global variables to be accessed as config.epochs as shown in data_wrangling.py which has partial hinting for the functions. The functions are global, so they will be moved to a DataWrangler.

MihaiSurdeanu commented 1 year ago

Please wait with the merging for ~1 more day. I need to push a few more things.