Closed ebgoldstein closed 2 years ago
@dbuscombe-usgs I see two things here:
1) your camera has too large of an aperture. Adjusting the camera focus and aperture are analog and manually tuned by the person who builds the camera (me, in the case of your camera). The 'local' fix for you is to close the aperture, and/or reduce the light (which can be done by carefully reducing the numbers in ringledon.py
https://github.com/UNCG-DAISY/Instagrain/blob/aa72156535290cb7b08d27b3f6a9a3789b64fc78/software/ringledon.py#L5
The 'global' fix is to write some docs explaining to a camera operator how to do this.. so i made a new issue #116
2) the ML issue here is to add augmentations in training that can specifically deal with overexposure/underexposure.. that will likely require the albumentations library.. edit: now issue #117
Makes sense, although I don't have the ability to log in to the system from my cell phone. Will try to look at aperture etc when i can, but i will prioritize physical samples and (sometimes subpar) photos for now. The variation between dry fine grains and darker grains can exist over tens of metres so not practical to adjust aperture while sampling
Definitely agree that augmentation and inclusion of subpar imagery are both valuable in the fight against bad data
Another thing I may be inadvertently doing in the field is adjusting the focus when putting lens cap on or off. It's difficult to see the screen in bright light and screen condensation (been living in my van and on colder mornings it takes time for the screen to fully clear. Putting this thing through its paces!) so I'm not confident I always spot problems
yes, i have worried about this with the lens cap, but i don;t think there is a way to avoid it and i also have not noticed that these small adjustments really influence the focus.. Also, for the record, the current model takes a 1024 x 1024 image and scales it to 224 x 224.. so absolutely perfect focus is likely not needed
I updated this issue to reflect the fact that it is for analyzing our current training/val data, not newly collected field data..
ok, i am looking at a ranked list of images that have highest error (Summed MAPE across all grain size bins..) here is an example of images with high error:
pics with low error:
Summed Error (MAPE summed across all sizes) is also correlated with larger size fractions, and the range (D98-D2).. here is a list of correlation coefficients for each grain size bin w/ summed MAPE:
the high error pics look coarser, and tend to look bimodal.. (note that I did not do a dip test for bimodality, but that could be interesting).. my thinking is that its worth continuing to look for coarser samples, and bimodal samples..
for now, i think i will close this issue.. i will make a link to it in the discussion board (since it is really connected to the first release of the model)....
Goal: Using our existing library of images + sed size measurements (which are currently used for training/val), determine where the model is not performing well (where MAPE is highest) and try to develop ways to mitigate this (more samples, weighting, etc.)
Preliminary tests suggest model performs poorly on the larger sizes for each cumulative class..