conor-horgan / spectrai

spectrai: a deep learning framework for spectral data
Apache License 2.0
56 stars 13 forks source link

How to get started? #3

Open baigar opened 4 months ago

baigar commented 4 months ago

Was any one out there able to get started with the examples given in the README.md of that repo? Running in PyCharm, I do not have got the commands given in the readme: spectrai_train, spectrai_evaluate and spectrai_apply! After setting proper path I can run the train.py (which uses the image_superresolution.yml) but it does not produce any output (added --verbose as command line parameter). Would be great to have at least for one of the cool examples given a step-by-step intro on how to reproduce the evaulation.

Sad that the other open issue, https://github.com/conor-horgan/spectrai/issues/1, has not been resolved as it goes into the same direction.

baigar commented 4 months ago

OK, by setting up a clean environment of python 3.8 and installing the dependnecies manually I get the training running (the scripts spectrai_* get created by "python setup.py install" and stored in the AppData path where all the Python system scripts are located. Adding the --save_frequency 1 data gets stored in the very secret path §YOURPATH§\AppData\Local\Programs\Python\Python38\Lib\site-packages\spectrai-0.1.4-py3.8.egg\spectrai. Testing to train all the examples, they are working - only issues are:

(1) the spectral_denoising throws an error before completing an epoch:

Traceback (most recent call last): File "C:\Users\EBaigar\AppData\Local\Programs\Python\Python38\Scripts\spectrai_train-script.py", line 33, in sys.exit(load_entry_point('spectrai==0.1.4', 'console_scripts', 'spectrai_train')()) File "C:\Users\EBaigar\AppData\Local\Programs\Python\Python38\lib\site-packages\spectrai-0.1.4-py3.8.egg\spectrai\train.py", line 98, in main output = trainer.train_epoch(Training_Options, Task_Options, Network_Hyperparameters, File "C:\Users\EBaigar\AppData\Local\Programs\Python\Python38\lib\site-packages\spectrai-0.1.4-py3.8.egg\spectrai\trainer.py", line 93, in train_epoch train_metrics = train(Task_Options, Training_Hyperparameters, Data_Augmentation, net, criterion, optimizer, scheduler, train_loader, device) File "C:\Users\EBaigar\AppData\Local\Programs\Python\Python38\lib\site-packages\spectrai-0.1.4-py3.8.egg\spectrai\trainer.py", line 312, in train loss.backward() File "C:\Users\EBaigar\AppData\Local\Programs\Python\Python38\lib\site-packages\torch_tensor.py", line 525, in backward torch.autograd.backward( File "C:\Users\EBaigar\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\autograd__init__.py", line 267, in backward _engine_run_backward( File "C:\Users\EBaigar\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\autograd\graph.py", line 744, in _engine_run_backward return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 64, 500]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

(2) The standard example image_superresolution throws a deprecated warning:

C:\Users\EBaigar\AppData\Local\Programs\Python\Python38\lib\site-packages\spectrai-0.1.4-py3.8.egg\spectrai\utils\utilities.py:222: FutureWarning: multichannel is a deprecated argument name for structural_similarity. It will be removed in version 1.0. Please use channel_axis instead. ssim += sk_ssim(output_i, target_i, data_range = output_i.max() - target_i.max(), multichannel=True)

So starting point OK. Now trying the spectrai_apply or spectrai_evaluate I am always getting the error message

    net_state_dict = config['net_state_dict']
KeyError: 'net_state_dict'

How it seems not being able to locate the pretrained network - so the readme is really not complete here. Tried varioous ways apply to the pretrained network, but the option --pretrained_network seems to have no effect :-( Really hard work getting the examples running.