APJansen commented 10 months ago

Implementation of training workflow

This PR makes heavy use of the lightning framework, which turned out very nicely. With basically a single line of code in main.py it creates a CLI to run a fit, taking a configuration file as argument, allowing the code itself to remain completely free of configuration. I've set it up so the config used is saved along with all the metrics to weights and biases.

codecov[bot] commented 8 months ago

Codecov Report

Attention: Patch coverage is 95.34884% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 80.69%. Comparing base (1977c83) to head (a2f4e30).

Files	Patch %	Lines
unsat/data.py	92.72%	4 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #9 +/- ## ========================================== + Coverage 79.46% 80.69% +1.23% ========================================== Files 8 8 Lines 224 259 +35 ========================================== + Hits 178 209 +31 - Misses 46 50 +4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

APJansen commented 8 months ago

@PabRod Can you check if this works for you? Instructions are in the readme, you'll need to update your dependencies as well, and probably still get the data in the right format.

PabRod commented 8 months ago

It fails with a not-so-transparent error:

> python unsat/main.py -c configs/test_config.yaml

usage: main.py [-h] [-c CONFIG] [--print_config[=flags]] {fit,validate,test,predict} ...
error: expected "subcommand" to be one of {fit,validate,test,predict}, but it was not provided.

I guess the problem lies within the readme subsection on "Weights and biases". It could be a good idea to elaborate a bit on that step.

APJansen commented 8 months ago

Sorry it was just missing the fit subcommand, it should be python unsat/main.py fit -c configs/test_config.yaml . Does it work with that?

PabRod commented 8 months ago

It misses the data files at the appropriate locations (and rightfully so), but it seems to work.

I'll make sure my data folder is up to date as soon as possible.

APJansen commented 8 months ago

I have set it up on Snellius as well, with instructions in the README. Can you check if it works for you @PabRod? The data is already on our shared project space and should be accessible to you.

PabRod commented 8 months ago

Hi @APJansen. I booked some time to look into this issue on February 8th. Let me know if this deadline is acceptable for you.

PabRod commented 8 months ago

I cannot find the shared folder. I'm looking for it inside /projects/, perhaps it doesn't have a straightforward name? Could you please provide me with the path, @APJansen?

APJansen commented 8 months ago

It's /projects/0/einf3381/UNSAT/data/experimental.h5, the test_config should already point to it by default. Can you access it?

APJansen commented 8 months ago

I have fixed a bug arising from different conventions about the axis order in pytorch vs Keras, closing #10.

You may be interested to have a look at the results of a small test here.

This is with a very simple and very small model, on a very small training set. But it now behaves nicely as expected: you see the training loss go down, and accuracy up. The validation accuracy also improves at first, but quickly starts to go down again as we are overfitting on the training set.

APJansen commented 7 months ago

Going to merge this, as the remaining discussion likely doesn't have to do with the code itself.

UNSAT3D / unsat

Train test #9

Implementation of training workflow

Codecov Report