KatrinaHoffert / EatSafe

An app for finding safe places to eat
Other
2 stars 2 forks source link

Get coordinates for locations and store them in the database #61

Closed KatrinaHoffert closed 9 years ago

KatrinaHoffert commented 9 years ago

Some kind of helper utility function that can be run. Use the Maps API to get coordinates for the addresses. Note that some locations don't have addresses or the address lookup might fail. Nothing we can do about that, but it means that coordinates might not exist for some locations.

Create a "Coordinates" table that maps addresses and cities to coordinates (latitude and longitude, stored as decimal numbers). This table is not created from scratch whenever we update the database, but rather exported to text (use pg_dump) and then imported whenever we update the database.

The utility program iterates over all locations and for the address of each location, we join into the Coordinates table. If an entry exists, good (skip). If an entry does not exist, lookup the coordinates using Google Maps Geocoding API and insert it iff Google Maps returned a coordinate. Note that some addresses might not return a coordinate, and the program must skip these.

Also note that Google Maps has low usage limits, so eventually we'll receive 403 errors (see here). The program must stop when this occurs (it can be run again later to get the remaining coordinates).

Once the coordinate-getting subsystem is implemented (in same manner as the CSV parser is implemented and run), we need to integrate the coordinates into the model. Add longitude and latitude parameters to the Location case class and have the SQL that gets locations join with the location table. Note that these parameters are of type Option[Double], since some locations don't have coordinates.

KatrinaHoffert commented 9 years ago

Whoever this is assigned to should be able to work on this early in the week, because #62 depends on this.

LujieDuan commented 9 years ago
  1. The schema is changed as described above. The table is named "coordinate". To backup coordinate and re-create database, run the following files in "database" folder. -- Step 1: run "BackupCoordinates.sql" to backup the coordinate table to a file. -- Step 2: run "CreateTables.sql" to drop and re-create the schema -- Step 3: run "RestoreCoordinates.sql" to restore the coordinate table -- Step 4: run "statements.sql" that contains locations, etc -- Step 5: run the PopulateCoordinate program to update coordinates for newly added locations

    -- For "test" database: -- Only need to run "CreateTables.sql" to re-create the schema

  2. Right now there are about 3600 + coordinates in database.
  3. To run the program, type "./activator test-only test.RunCoordinate". -- The reason of using play test to run: This subsystem need db connection and other application components to run, while the test has a "withApplication" method that will set up a fake application for the subsystem.
magnusandy commented 9 years ago

Did you try moving the program out of the main test folder, otherwise it will be run every time that the TravisCI is run (actually any time all tests are run). Which is most likely a bad thing

KatrinaHoffert commented 9 years ago

Yeah, that's definitely a bad thing, because it'll nuke the tests with regard to run time. Unfortunately, if we need a running application, that complicates things (I wonder why WebSockets needs that?).

If there's no other way to get it to work normally, we might have to use a different library to make the web request. Like this one. Because this simply can't be a test (it ruins the ability to run the rest of our tests).

Can still use Play's JSON handling, by the way. It's just the web request that would have to be changed. If you use the library I linked, make sure to use the most recent version. ie, add

libraryDependencies +=  "org.scalaj" %% "scalaj-http" % "1.1.4"

to the build.sbt file.

magnusandy commented 9 years ago

we might be able to move it out of the normal test folder and still run it from the command line: test-only someOtherFolder/CoordinateSpec I dont know if this will work and not run the test in the regular suite though.

On Sat, Mar 7, 2015 at 11:33 PM, Mike Hoffert notifications@github.com wrote:

Yeah, that's definitely a bad thing, because it'll nuke the tests with regard to run time. Unfortunately, if we need a running application, that complicates things (I wonder why WebSockets needs that?).

If there's no other way to get it to work normally, we might have to use a different library to make the web request. Like this one http://stackoverflow.com/a/11720469/1968462. Because this simply can't be a test (it ruins the ability to run the rest of our tests).

— Reply to this email directly or view it on GitHub https://github.com/MikeHoffert/EatSafe/issues/61#issuecomment-77734231.

LujieDuan commented 9 years ago

I just added the rest coordinates to the database, so now there are 4600 out of 4900 locations got coordinates, while the rest 300 cannot get result from geocoding, and I put (0,0) to database for those 300 location.

Right now I use "address + city" as parameter to search, later I will add "SK" to parameter. Maybe this will get rid of some non-SK coordinates.

Also, the coordinates are in "database/coordinateBackup.txt" for database recreating needs.

On the "running" thing, I tried 1, delete all SQL/DB code, only run the web request code, 2, delete all web request code, only run the SQL/DB code, 3, delete all web and DB code, only run "println("Hi")" in the main

then use "./activator "run-main coordinatesGetter.PopulateCoordinates"" 1 and 2 gave me the same exception:

(run-main-0) java.lang.RuntimeException: There is no started application

while 3 can print the word. So i assume both DB and web request need the "fake application". So, even if I use that library, I still need a way to get the DB connection, right? Am I thinking on the right direction?

KatrinaHoffert commented 9 years ago

Hmm, needing the active application for the DB connection as well complicates things. Maybe Andrew's comment about moving the test out of the test folder but executing the same way will be best. Haven't tried it myself, but sounds promising..

I think that failure to get coords should insert nulls, as they are truly obvious lay invalid and can possibly allow getting coords in the future if the approach changes.

Addresses should probably be film qualified to prevent mismatched. Eg, 123 fake St, Saskatoon, Saskatchewan, Canada. On Mar 8, 2015 4:17 PM, "lujie duan" notifications@github.com wrote:

I just added the rest coordinates to the database, so now there are 4600 out of 4900 locations got coordinates, while the rest 300 cannot get result from geocoding, and I put (0,0) to database for those 300 location.

Right now I use "address + city" as parameter to search, later I will add "SK" to parameter. Maybe this will get rid of some non-SK coordinates.

Also, the coordinates are in "database/coordinateBackup.txt" for database recreating needs.

On the "running" thing, I tried 1, delete all SQL/DB code, only run the web request code, 2, delete all web request code, only run the SQL/DB code, 3, delete all web and DB code, only run "println("Hi")" in the main

then use "./activator "run-main coordinatesGetter.PopulateCoordinates"" 1 and 2 gave me the same exception:

(run-main-0) java.lang.RuntimeException: There is no started application

while 3 can print the word. So i assume both DB and web request need the "fake application". So, even if I use that library, I still need a way to get the DB connection, right? Am I thinking on the right direction?

— Reply to this email directly or view it on GitHub https://github.com/MikeHoffert/EatSafe/issues/61#issuecomment-77778067.

KatrinaHoffert commented 9 years ago

Not an issue anymore because we run with test-only *MainTest, etc.