gohtanii / DiverSeg-dataset

MIT License
10 stars 0 forks source link

Ideas #1

Open Phhofm opened 2 months ago

Phhofm commented 2 months ago

Thank you a lot for your study :)

As a hobby I have been training sisr models, and made some curated versions of datasets. So this study is very interesting to me.

I thought Id just write some ideas/inputs I had (extending this pipeline with tiling, iqa and ic9600):

I see that you use blockiness scoring to filter out jpg compressed images. Its a great idea. I think this can even be extended upon by using IQA metrics such as HyperIQA to score the image tiles for the sisr training dataset to also filter out noisy, blurry etc tiles. (I like my tiles clean. We can still add noise, blur, compression etc into the lr images for the model to learn to deal with them, but for that we need our hr to be clean)

Which brings me to another step, including tiling in this automated process. This will improve i/o speed during training, since it will crop during training anyway. There is no need to load a 4k input image to take a 256x256px crop out of it just as an example. The LSDIR dataset is all over the place with image sized. I like to use 512x512px tiles as my dataset, which is big enough to increase gt size during training to get better results, but also small enough to have good training speeds.

Maybe this pipeline could also additionally be extended by using IC9600 for automatic image complexity assessment. In the end we could have a very good pipeline producing good sisr training datasets automatically. Im just missing tiling, iqa scoring currently in this pipeline (and maybe complexity assessment).

For an example of a recent dataset I curated with all the steps I took would be in the Nature Dataset

I just thought id mention that I think LSDIR has some bad quality images in it that I saw when inspecting it, which will affect model training negatively and it might not be the best reference as a comparison point for a high quality dataset. (This is why a year ago i tried to make a curated version of it called SSDIR)

Something else I might mention, there are small sisr networks to train like Compact, medium like RealPLKSR and big ones like ATD. From experiencs id say small datasets like <3k image tiles is enough to train a small network, between 3-6k is enough for a medium network, and 6k-20k is enough for a big network. Or in other words, I was able to achieve a similiar quality with 8k images in comparison to LSDIR with like 84k input images. As long as we make sure that it is high quality and high variety dataset. Also ensuring these quality requirements on a smaller dataset is more realistic than a huge one. A good example would be the Nomosv2 dataset musl curated I think, 6k image tiles he distilled out of multiple datasets (LSDIR included).

Those are just my thoughts, inputs and insights. Thank you for this study and your work in this topic :)

(PS I just remembered something else I like to do is to check the dataset for similiar images and delete them so there are no duplicate images or visually similiar ones since thats just redundancy in the training process, Im using Czkawka for that currently)

gohtanii commented 1 month ago

Thank you for sharing your ideas and insights, and I apologize for the delayed response! We appreciate your interest in our study and your valuable suggestions.

By incorporating IQA metrics, we would be able to filter out noisy and blurry images, which could further improve the Quality Estimation part. We would like to give this a try. Additionally, since ImageNet and PASS are mainly composed of low-resolution images, we have not applied tiling so far, but as you mentioned, tiling is certainly effective when working with high-resolution images. We would also like to explore the idea of using IC9600 for assessing image complexity.

Your other ideas and insights are also helpful, and we will take them into consideration.

Phhofm commented 1 month ago

Thank you for the response and nice that you want to explore these topics :D Thats something im working on currently or trying to evaluate myself with some tests You could have a look at my video, the HQ-50K Dataset Curation section with Testing after could be interesting to you, your paper is also referenced in there. I simply applied dataset curation techniques like i often do for my sisr datasets, but then i was like I wanna test these things more so i can proove that these steps help or dont help, and what their effects are, and what ideal thresholds might be etc, which i talk a bit about in the testing section there. First I started with div2k but the thing is, is has not well enough score distribution for my tests, or in other words, i switched to df2k, because, for example, div2k's (tiled to 512x512) highest blockiness score was 33 while in df2k there was still over 100'00 tiles that scored worse in blockiness than that, so blockiness filtering would have way less on an effect on that div2k in comparison Also div2k had less tiles of course and df2k had more tiles, or in other words, when fully filtering with qaling_8bit >= 4, IC9600 >= 0.5 and Blockiness <2, in div2k only 593 tiles survive (which is a super small dataset to trian with) out of 5052 total tiles, but with df2k 1849 tiles survive (out of 21387 tiles) Anyway i think some things i explained in the video. Im still running tests, and there is more i'd like to test still. Like what effect hast filtering on scores (>=) 0.5, 0.6, 0.7, 0.8 .. of complexity alone, since my graphs indicate that complexity has an influence, and so forth. So which filtering has what effect on which thresholds (is it just higher is better or is there a threshold where the model improves not or only marginally for us to sacrifice tiles and therefore content with it). Also i think its fair if i just run some iqa tests in the sense of qaling got some best results on benchmarks, but does that translate so sisr training output results? How does for example a model trained on the 5000 best scored qaling filtered tiles compare to a model trained on the 5000 best scored hyperiqa filtered tiles (hyperiqa is way faster and less resource intensive to run in comparison). Then if higher patch sizes for training profit more from such or certain filtering techniques. If bigger networks with higher patches during training profit even more. Then what if multiscaling is used with these filtering techniques. And also of course, the iqa metrics used influence the results like i show in the video, graphs on psnr would show that the model trained on qaling filtered (score >=4) did worse but dists shows that this model did better, than the baseline model (no filtering applied just tiled df2k), so different metrics can be used/shown like topiq_fr and stlpips-vgg etc. Äh yeah maybe too much i would like to test, its pretty fascinating. And here the link to my video: https://youtu.be/Nv7fSreanzc?si=C4FhLIXZyNrdCngs (PS if you are doing a paper on this, i would be happy to somehow participate and have my name on the paper if possible, then i would not only have trained and released multiple sisr networks but also academically contributed to the community. If not no problem, I will still support with ideas and suggestions if possible or needed, since this is a hobby and interesting to me so ill gladly help)

Phhofm commented 1 week ago

PS i made multiple tests concerning IC9600 and HyperIQA for sisr dataset training, and made a huggingface community post, might be of interest (I made all trained test model files, config files etc available for reproducibility): https://huggingface.co/blog/Phips/bhi-filtering