analyze / fix output reliability - Githubissues

alanngnet / CoverHunterMPS

Fork of Liu Feng's CoverHunter to run on a single computer, plus more features and documentation.

6 stars 2 forks source link

analyze / fix output reliability #29

Open alanngnet opened 3 months ago

alanngnet commented 3 months ago

We have observed significantly inconsistent training results between specific computers. Investigate and resolve the issue(s). Could be computer-specific problems, or Pytorch bugs, rather than project-specific problems?

Edit: the problem is not just between computers, but from run to run. Problems to solve or issues to understand and document are:

[ ] Under which conditions does the model train deterministically?
[ ] Why or in which conditions does changing hyperparameters or even dataset distributions have as much or little impact on training metrics as does changing the random seed value?
[ ] What are the most significant sources of differences in training outcomes between two different computers using the same data and hyperparameters?
[ ] How do we overcome or reduce variability so that fine-tuning work can happen?

alanngnet commented 2 months ago

Remaining task is to verify fix on CUDA.

alanngnet commented 2 months ago

Latest discovery: covers80 trains deterministically after the latest patches, but using a different training dataset (our irish_reels_test data set) does not. Suspicion currently centers around the dataloader activity.

alanngnet commented 2 months ago

Useful discovery: Setting hyperparameter mode="defined" causes consistent deterministic training results. This may be sufficient to get back to work using Tunography (Irish trad) data because of the hiqh quality clipping of that data, because for that data source, always starting chunks at the 0 second position is a good way to chunk. Whereas data sources with arbitrary delays before relevant music begins probably need mode="random", so we should make that mode be more reliable.

alanngnet commented 2 months ago

Verified still deterministic when using mode="defined" even with different chunk_frames and chunk_s settings.

alanngnet commented 2 months ago

Nope, still non-deterministic training in some situations, even with mode="defined." Not clear what factors define those situations.

alanngnet commented 2 months ago

Measured range of variability in training runs given same data and hyperparameters and number of epochs, but different random seeds. Across 4 runs, observed standard deviations of the 4th-highest mAP scores are around 5-7%, which is very significant given that fine-tuning work in the field of CSI is often aiming at few-percent improvements, especially since fine-tuning work needs to happen with smaller datasets that naturally have higher variability of training outcomes.

alanngnet commented 2 months ago

Progress report:

MPS runs are mostly deterministic using current committed code, even with mode: "random", but not all the time, still unclear what leads to non-determinism on MPS.

Now that @samuel-gauthier is back at work on this issue: On CUDA the model is never deterministic. Current evidence suggests something in the optimizer stage is causing CUDA-specific non-determinism.

@samuel-gauthier has confirmed that the math precision-reduction changes I made in the very first CoverHunterMPS commits are apparently not relevant to the problem and also do not hurt overall training results: Commit abe657d src/loss.py: Old lines 87, 106 src/model.py:New line 232

Commit af0d21a src/loss.py: New line 122

alanngnet commented 1 month ago

Demoting this ticket now that train_tune script mitigates non-deterministic behavior by averaging across multiple seeds. Inconsistent non-deterministic behavior still observed on MPS, but I noted a possible pattern of deterministic results when there's a system reboot just before both runs.