allenai / s2_fos

Apache License 2.0
32 stars 2 forks source link

Updating package to include training an inference files for the fos_v2 project #17

Closed egork520 closed 9 months ago

egork520 commented 9 months ago

Added training and infernece files for the fos_v2 project

Switched package manager to poetry Switched to pyproject.toml instead of setup.py

What needs to be done:

sergeyf commented 9 months ago

Looks like you have a conflict with s2_fos/__init__.py

sergeyf commented 9 months ago

Thanks for the updates. Ready for another review?

egork520 commented 9 months ago

Thanks for the updates. Ready for another review?

Yes please

sergeyf commented 9 months ago

Just a few more small requests.

Have you tried to reproduce the entire readme.md file in a brand new environment that is not logged into AI2?

egork520 commented 9 months ago

Just a few more small requests.

Have you tried to reproduce the entire readme.md file in a brand new environment that is not logged into AI2?

Good suggestion about brand new environment. Will try on my personal computer.

egork520 commented 9 months ago

@sergeyf I've tried it to run on my personal computer and discovered a few things I had to do which I added to the Readme.md

Training code is running on my mac okay.

I also had to add instructions on how to add hugging face token to environment in order to access gated model (Impact License)

I think it is good to go now. Will check take a look tomorrow morning once more.

egork520 commented 9 months ago

Hopefully you don't mind automatic download from hugging face dataset and train/test/validation.

It is a bit of the black box for users but simpler setup and hopefully higher rate of the success running it end to end

sergeyf commented 9 months ago

Automatic download is great. As long as train/val/test is exactly reproducible, then all the user needs to know is that it's happening and they don't have to do it themselves.

egork520 commented 9 months ago

Automatic download is great. As long as train/val/test is exactly reproducible, then all the user needs to know is that it's happening and they don't have to do it themselves.

Ok adding a seed so that it is not random splits