Open taeil opened 3 years ago
Trying to calculate mean/std and getting error. command
cd src/data/
./compute-dataset-pixel-mean-std.py --data /scratch/crguest/data/sen12ms_small
error
main(parser.parse_args())
File "./compute-dataset-pixel-mean-std.py", line 49, in main
for data, _ in loader:
File "/scratch/crguest/miniconda3/envs/hp120/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
return self._process_data(data)
File "/scratch/crguest/miniconda3/envs/hp120/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/scratch/crguest/miniconda3/envs/hp120/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
OSError: Caught OSError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/scratch/crguest/miniconda3/envs/hp120/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/scratch/crguest/miniconda3/envs/hp120/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/scratch/crguest/miniconda3/envs/hp120/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/scratch/crguest/miniconda3/envs/hp120/lib/python3.7/site-packages/torchvision/datasets/folder.py", line 138, in __getitem__
sample = self.loader(path)
File "/scratch/crguest/miniconda3/envs/hp120/lib/python3.7/site-packages/torchvision/datasets/folder.py", line 174, in default_loader
return pil_loader(path)
File "/scratch/crguest/miniconda3/envs/hp120/lib/python3.7/site-packages/torchvision/datasets/folder.py", line 156, in pil_loader
img = Image.open(f)
File "/scratch/crguest/miniconda3/envs/hp120/lib/python3.7/site-packages/PIL/Image.py", line 2818, in open
raise IOError("cannot identify image file %r" % (filename if filename else fp))
OSError: cannot identify image file <_io.BufferedReader name='/scratch/crguest/data/sen12ms_small/p291.tif_spring/ROIs1158_spring_s2_121_p291.tif'>
(hp120) ➜ data git:(taeil) ✗ ls -al /scratch/crguest/data/sen12ms_small/p291.tif_spring
total 2492
drwxrwxr-x 2 crguest crguest 4096 Mar 5 23:10 .
drwxrwxr-x 247 crguest crguest 45056 Mar 5 23:10 ..
-rwxrwxr-x 1 crguest crguest 262788 Mar 5 23:10 ROIs1158_spring_lc_121_p291.tif
-rwxrwxr-x 1 crguest crguest 525172 Mar 5 23:10 ROIs1158_spring_s1_121_p291.tif
-rwxrwxr-x 1 crguest crguest 1706432 Mar 5 23:10 ROIs1158_spring_s2_121_p291.tif
changed the folder path not to have a dot just in case, but still no luck.
There is no issue with normal jpg images or Tiffs from BigEarthNet. Also, there is no issue with SEN12MS LC image. The issue is with only SEN12MS S1 and S2 images. These sentinal images are having multiple bands in the same image where as BigEarthNet has separate image for each band.
We can reproduce the issue with a simple python script:
from PIL import Image
path = "/scratch/crguest/data/sen12ms_small3/test/p124_summer/ROIs1868_summer_s2_11_p124.tif"
page = Image.open(path)
We are using Image reader from PILLOW https://readthedocs.org/projects/pillow/downloads/pdf/latest/ On page 21? It has the following: However, Pillow doesn’t support user-defined modes; if you need to handle band combinations that are not listed above, use a sequence of Image objects. Checking on that.
There could be a limitation with PyTorch if we stack more channel images in one single Tiff file.
https://github.com/pytorch/vision/issues/514
Comment: "I just discovered a way to do this, I am not sure it can be solution to your problem, but I'll share it in case it can be useful to others. In my case I had multi-channel Tiff images, and I wanted to classify them using CNNs in Pytorch. I honestly gave up on data augmentation using Transforms in Pytorch, and I performed data augmentation offline (let's say in my input folders I have original data as well as augmented ones).
The game changer is however defining your own loader + taking advantage of Tifffile library in python. This is how I did it for my training set (val and test should be the same):"
import tifffile
def my_tiff_loader(filename):
return tifffile.imread(filename)
train_transform = transforms.Compose([transforms.ToTensor()])
train_data = datasets.ImageFolder('PATH TO TRAINSET', loader=my_tiff_loader, transform=train_transform)
Note: S1 has only 2 bands. LC has 4 bands and it is working. But, LC size is small.
Found custom data loader code with SEN12MS. We are unblocked for now. Some additional reading that is helpful for everyone.
So, if we have many stacked or custom mode channels in satellite imagery, then we need to use code that uses rasterio instead of Pillow.
Added additional pipeline and datasources for sen12ms. good progress are made. see the changes on taeil branch. There is chance that we can run pre-training tomorrow.