MushroomObserver / mushroom-observer

A website for sharing observations of mushrooms.
https://mushroomobserver.org
MIT License
77 stars 25 forks source link

Location.contains Scope #2212

Closed JoeCohen closed 2 weeks ago

JoeCohen commented 3 weeks ago

Adds a new Location scope. Location.contains_box(n: a, s: b, e: c, w: d) returns all Locations which bound the defined rectangle.

This is a small, but necessary step, in dealing with iNat places as part of importing iNat observations.

coveralls commented 3 weeks ago

Coverage Status

coverage: 94.402% (-0.05%) from 94.455% when pulling 97398606ca5ae22369183f76eba6866824313baf on jdc-location-contains into 9be801bebd94f9562920ff5dffef134f55896942 on main.

coveralls commented 3 weeks ago

Coverage Status

coverage: 94.402% (-0.05%) from 94.455% when pulling 2d7d52e6dffebfe631214b9394ecc86472fdcf2f on jdc-location-contains into 9be801bebd94f9562920ff5dffef134f55896942 on main.

mo-nathan commented 3 weeks ago

@JoeCohen These values are getting rounded in the database.

mysql> update locations set east = -179.61345 where id = 732613413;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select east from locations where id = 732613413;
+----------+
| east     |
+----------+
| -179.613 |
+----------+
1 row in set (0.00 sec)

Is there reason to believe this is new behavior?

JoeCohen commented 3 weeks ago

@mo-nathan Thanks for the rounding information! That's helpful. I have no reason to believe the db behavior is new. I'm probably doing something that -- for the first time -- depends on more precision. I'll see if I can fix that.

nimmolo commented 3 weeks ago

Note that there's rounding happening in the JS too, which eventually populates the DB.

Consider making an app-wide Rails constant for the number of decimal places we round to. Once we establish that, I can send that to the JS for consistency.

JoeCohen commented 3 weeks ago

@nimmolo: app-wide rounding constant seems like a good idea. I think it should be in a different PR because (a) I don't want to stack too much stuff into this one; (b) I don't know how to do it or where to put it. Probably it should be derived from the db. I'm guessing (based on experimentation) that it's 4 digits to the right of decimal point for positive numbers, 3 for negative numbers. I also don't understand whether the db rounds it or just truncates it.

nimmolo commented 3 weeks ago

If the db were doing the rounding, i'm pretty sure it would be in our schema. It's not the db doing this though - it's our Location-saving methods that round the float.

I'm not aware of any code that rounds negative numbers differently - that's worth checking on. I'm also not aware of any truncation!

But there is more than one place in the code that rounds lat/lngs, so let's not forget to fix this error-prone situation. (I guess we could also change the schema through a migration - maybe the db is the place to do this.)

If we do a constant that is something I would put in a config with our other MO constants.

Let's get advice from @mo-nathan - he may have a preference.

nimmolo commented 3 weeks ago

Per your communication on Slack, I went ahead and changed to contains_point on my PR, which I hope to merge today.

mo-nathan commented 3 weeks ago

@nimmolo What evidence do you have that the fixtures are getting affected by any of our code? I did look at this and could not find anywhere that rounding was actually happening during a test run. This was using the test that uses the location fixture that Joe mentioned. I then found articles on the web about MySQL rounding FLOAT type values automatically and confirmed that with the test I gave above (that uses no Ruby at all).

nimmolo commented 3 weeks ago

Oh - it sounds like i misperceived the situation. I have no evidence of fixtures being affected.

My concern is more structural, and I wanted to raise a flag before I forgot about it again. I've just been working on location code for a few months and realizing that:

I recently changed all the float rounds to 4 places (some of mine from a couple years ago had been 6 places), and it occurred to me to use a constant, but i didn't go that far.

mo-nathan commented 3 weeks ago

I agree that being consistent about the rounding would make sense. I did some more playing around with the lat/lng and on the production server we round to 6 decimal digits. This mean that somewhere around Kansas the longitude resolution switches from about 300' to about 30'. This can cause weirdnesses like this: https://mushroomobserver.org/locations/29140 (this is the house where I grew up). This seems wrong, but I'm not sure exactly what "right" is. That observation was created by searching Google for that address and then saving. While I was defining it, it looked like this:

Screenshot 2024-07-06 at 3 47 03 PM

It would at least be better if it saved as:

Screenshot 2024-07-06 at 3 50 37 PM

rather than

Screenshot 2024-07-06 at 3 51 25 PM
mo-nathan commented 3 weeks ago

It seems like it would be be reasonable to support 7 digits (rather than 6) and then round off to the nearest 1/10,000s of degree (~30ft). This would not lose any existing data, but would start to treat the west coast the same as the east coast.

mo-nathan commented 3 weeks ago

Note that my suggested solution involves a data migration to store higher precision floats in general. We might consider switching to the DECIMAL type. Not sure whether Rails likes that, but it seems like the right number format based on the MySQL documentation.

coveralls commented 3 weeks ago

Coverage Status

coverage: 94.402% (-0.05%) from 94.455% when pulling f62e3ed4afc1bffa4ce56ecbe8870ecff3800b6b on jdc-location-contains into 9be801bebd94f9562920ff5dffef134f55896942 on main.

coveralls commented 3 weeks ago

Coverage Status

coverage: 94.457% (+0.002%) from 94.455% when pulling f6b7afdce5b9c3bfda2ef437a4834acfa45c6892 on jdc-location-contains into 9be801bebd94f9562920ff5dffef134f55896942 on main.

coveralls commented 3 weeks ago

Coverage Status

coverage: 94.401% (-0.05%) from 94.455% when pulling f6b7afdce5b9c3bfda2ef437a4834acfa45c6892 on jdc-location-contains into 9be801bebd94f9562920ff5dffef134f55896942 on main.

JoeCohen commented 3 weeks ago

Re rounding:

  1. I was wrong about negative numbers. It's a total of 6 digits. As @mo-nathan points out, this leads to less precision in (-100..100).
  2. I agree with @mo-nathan that the ideal solution (for Location) is decimal north, south, east, west. (Maybe also altitude.) Otherwise there are surprises. Simple example:
    (ruby) Location.where(Location[:east].eq(albion.east)).include?(albion)
    false

    @nimmo: Above implies that until/unless we switch to decimal, scope_contains_point should have have something to compensate for db rounding, and should have some model unit tests. I'll have a look at your PR later today.

JoeCohen commented 3 weeks ago

PS: @mo-nathan: Rails is okay with :decimal column type. https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/SchemaStatements.html#method-i-add_column Question: Would we need to recast the existing db entries for affected columns?

mo-nathan commented 3 weeks ago

Probably the best option would be to have a migration that creates a set of new decimal columns, copies the date to the new columns, drops the old columns, then renames the new columns to the old names. Obviously we’d have to test it, but at least in theory that should do it.

On Sat, Jul 6, 2024 at 5:44 PM Joseph D. Cohen @.***> wrote:

PS: @mo-nathan https://github.com/mo-nathan: Rails is okay with :decimal column type. https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/SchemaStatements.html#method-i-add_column Question: Would we need to recast the existing db entries for affected columns?

— Reply to this email directly, view it on GitHub https://github.com/MushroomObserver/mushroom-observer/pull/2212#issuecomment-2211978002, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYQDUQL3MCYTDTIOWTNLOTZLBQJ7AVCNFSM6AAAAABKML7OXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJRHE3TQMBQGI . You are receiving this because you were mentioned.Message ID: @.***>

nimmolo commented 3 weeks ago

Hold on... let's not forget about Geometry columns!

I feel like that's the migration we want here. Almost totally forgot about that!

I researched all this awhile back. Having these bounding boxes as polygons, and observations as points, allows us to let the db find overlaps muuuuch more efficiently.

coveralls commented 3 weeks ago

Coverage Status

coverage: 94.401% (-0.05%) from 94.455% when pulling 85ed6a71a1d008c93f0ffaaa924effc3255c17bd on jdc-location-contains into 9be801bebd94f9562920ff5dffef134f55896942 on main.

nimmolo commented 1 week ago

Potentially interesting: Rails - How to migrate code from float to decimal