funkelab / gunpowder

A library to facilitate machine learning on multi-dimensional images.
https://funkelab.github.io/gunpowder/
MIT License
78 stars 56 forks source link

train loop log: add current lr and set loss precision #113

Closed abred closed 4 years ago

abred commented 4 years ago

print the current lr if available, useful for decaying lr schedules. set fixed precision for loss, lr and time

pattonw commented 4 years ago

Seems useful to log learning rate, but I'm not sure about the interface. Checking if a class has an attribute seems like it might be difficult for people to discover/use since they would have to write their own subclass the GenericTrain node.

abred commented 4 years ago

Yes, it's not the cleanest option, the idea was that I don't have to touch all existing Train nodes/subclasses this way.

pattonw commented 4 years ago

I'm going to go ahead and close this. After thinking about this more, its not clear that we even want to log the iteration/loss/etc at all in the generic train node. A user could log all of this data in the outer loop since it is added to the batch. Something along the lines of:

with build(pipeline):
    for _ in range(num_iterations):
        batch = pipeline.request_batch(request)
        logger.info("Train process: iteration=%d loss=%.6f lr=%.7f time=%.3f",
                batch.iteration, batch.loss, batch.current_lr,
                batch.time_of_iteration)

which would give the user more freedom to log whatever they want, or use progress bars like tqdm, etc.

Whether attributes on the batch is a good way to go is another question (maybe the train nodes should provide a non-spatial array of training statistics that users could use to store/access anything they want), but I think the number of things we should log automatically should be minimal, and instead just make it easier for users to customize what is being logged and where