Zhou-Hangyu / allclear

https://allclear.cs.cornell.edu/
MIT License
11 stars 1 forks source link

关于数据集无法打开的情况 #1

Open SueZhang2000 opened 1 week ago

SueZhang2000 commented 1 week ago

作者您好!感谢您的工作!在数据集下载的过程中我有几点疑问想请您解答一下:

  1. https://allclear.cs.cornell.edu/ 中目前只开源了一部分的数据(test set),未来有望多开源一些吗?
  2. test set数据下载并解压后,发现tif文件无法用arcgis打开,用GDAL读取全为nan值,是否数据上传有误? 期待您的回复!
Zhou-Hangyu commented 1 week ago

Thank you for your attention! We are currently cleaning up the codebase and fixing bugs. Thanks for catching these, we will resolve them in the next few days!

And for dataset access, we will make sure all of our data is publicly accessible. But it takes time to set up the data server and we are working on that at the moment.

Zhou-Hangyu commented 1 week ago

Hi there,

I just reviewed the data, and everything looks good. However, there are two points that we should clarify:

  1. We use the Cloud Optimized GeoTIFF (COG) format to store all our TIFF files. This format ensures optimal I/O performance. However, it may cause compatibility issues with the ArcGIS platform, as mentioned here. Could you provide more details about the error you're encountering? That would help us make a more accurate assessment. Thanks!
  2. The raw data contains some NaN gaps around the boundaries due to the geoprojection process. To address this, we slightly expand the download region and perform on-the-fly center-cropping to generate images with 256x256 pixels, eliminating those gaps. We plan to post-process the entire dataset to remove these gaps in the final release. In the meantime, you can manually center-crop the images or use tools like visualize_one_image() to explore the dataset.
SueZhang2000 commented 1 week ago

Hi, I'm glad to see that you have replaced the data source. Now the downloaded data is a tif file that can be opened. However, I have a few suggestions for your dataset:

  1. There are only 1-2 S2 images for many ROIs (possibly because this is a test set)
  2. I have observed that in some S2 folders, all images have clouds (roi26433\2022_11\s2_toa), or all images do not have clouds (roi25904\2022_11\s2_toa), which cannot provide cloudless and cloudy image pairs for cloud removal tasks.
  3. It would be better if the target image could be stored separately, just like Figure 5 in the article. BTW, thank you very much for your sharing! I believe it will be used more in the future!
SueZhang2000 commented 1 week ago

In addition, I found that the image quality obtained by reading the BGR band is not as good as shown in the article: many ROIs only have noises, and the color difference of some image pairs is also relatively large. If I have made any mistakes in the image reading method, please let me know image image