Big-Bee-Network / bee-image-finder

Creative Commons Zero v1.0 Universal
0 stars 0 forks source link

Bee image finding downloading multiple specimens #1

Open seltmann opened 1 year ago

seltmann commented 1 year ago

@jhpoelen we are making great progress measuring bees downloaded using Preston and the bee-image-finder. There is a curious issue, where multiple specimens are being downloaded together. Some of them are 2 very different taxa (bee and leafhopper). See our working spreadsheet at:

https://docs.google.com/spreadsheets/d/1Kx_aWF5Sz-o62j3Utxp5rfzb6bTaT4my6GbTQbpXnD8/edit?usp=sharing

Folder CASTYPE1503 is a good example. It includes a bee, a spider and a beetle. I would not think these all have the same catalog number.

jhpoelen commented 1 year ago

Thanks for sharing your detailed example of unexpected behavior of finding images for specific type specimen.

I ran:

./find-images.sh CASTYPE1503

and zipped up the dist folder and attached the results.

CASTYPE1503.zip

And yes, even for an untrained eye like mine, the included images do look like a mix of different specimen. (see attached)

000015-CASTYPE1503 000014-CASTYPE1503 000013-CASTYPE1503 000012-CASTYPE1503 000011-CASTYPE1503 000010-CASTYPE1503 000009-CASTYPE1503 000008-CASTYPE1503 000007-CASTYPE1503 000006-CASTYPE1503 000005-CASTYPE1503 000004-CASTYPE1503 000003-CASTYPE1503 000002-CASTYPE1503 000001-CASTYPE1503

jhpoelen commented 1 year ago

A at a quick glance, the find-images.sh currently selects images from catalog number that start with the provided catalogNumber.

So, CASTYPE1503 would match CASTYPE1503 specimen data, but also, CASTYPE15037. This is supported by labels included in earlier comments. Am working on a fix.

jhpoelen commented 1 year ago

After applying a fix to make the catalog number selector a little less lenient, I generated the following results via

./find-images.sh CASTYPE1503

CASTYPE1503.zip

Associated images include:

000004-CASTYPE1503 000003-CASTYPE1503 000002-CASTYPE1503 000001-CASTYPE1503

Please confirm that this addressed your concern.

Thanks again for providing such specific feedback!