Segmentation fault, CUDNN_STATUS_EXECUTION_FAILED, and launch time out

Consistent crashes occur with any nonzero value of num_workers on Windows systems. Multiprocessing on Linux doesn't seem to have this issue, but the added abstraction of WSL2 leads to low GPU utilization and thus lower throughput.

After painful trial and error, here are a few observations.

Even running lightly's imagenette benchmark leads to crashes
When benchmark files are mounted on the same drive as the conda / cuda installations, there seems to be no issue running lightly's imagenette benchmark. This may explain why wafer map benchmarks never crashed on the TI system...
Moving this repo's benchmarks to the same drive as the conda / cuda installation still leads to a crash, but it doesn't seem to be related to CUDA. It may be related to data loading, since the script consistently crashes before the start of the first training epoch. Could it be a segmentation fault?

Possible course of action:

Total revamp of data loading. Specifically, the custom WaferMapDataset needs to be looked at. Data is loaded from a Pandas dataframe, specifically two Pandas series of numpy arrays (more or less a ragged array structure). The wafer map Series is converted to a list of different-sized tensors. PyTorch's nested tensors may be a better data structure for this. These are experimental and may change with the upcoming PyTorch 2.0 release though.
Custom transforms may need to overhauled. The only non-torch library used is numpy, which shouldn't cause issues? But this may need investigation.
Move everything to the same drive bro 😒🤦

faris-k / fastsiam-wafers

Segmentation fault, CUDNN_STATUS_EXECUTION_FAILED, and launch time out #10

After painful trial and error, here are a few observations.

Possible course of action: