Safecast / safecastapi

The app that powers api.safecast.org
44 stars 25 forks source link

Rework BgeigieImport#process to avoid silently losing data points during import #410

Open Lugulbanda opened 6 years ago

Lugulbanda commented 6 years ago

Description

There are some drives in the system where the number of data lines does not equal the number of imported rows. For example:

safecast=# select lines_count from measurement_imports where id = 32985;
-[ RECORD 1 ]-----
lines_count | 2758

safecast=# select count(*) from bgeigie_logs where bgeigie_import_id = 32985;
-[ RECORD 1 ]
count | 1503

The process by which logs get from the file involves a few suspicious steps (randomly named tmp file, a tmp table that's shared). The table copy operation will also silently suppress any lines which have matching md5sums already in the table.

Would be good to re-work this process a bit to:

Reworking this to avoid direct psql calls should help with https://github.com/Safecast/safecastapi/issues/416 as well. There are a number of other older tickets I suspect stem from silent failures during import (https://github.com/Safecast/safecastapi/issues/8, https://github.com/Safecast/safecastapi/issues/43, https://github.com/Safecast/safecastapi/issues/52, maybe others.)

Background from original submission

yesterday i tryed to uploade some data i measured,

30031216.log

NEW LOG

format=1.3.4nano

deadtime=on

$BNRDD,3003,2017-12-16T22:43:38Z,39,4,51,A,5039.7536,N,00709.2440,E,168.60,A,8,110*46

the time in this line is not my time so i think its a global time, so i only can guess, that this line happend in a short Tunnel were i had no GPS location, but strangely why only one line the tunnel was aprox 900 meters wich should get a lot more of this gps lines...

i saw the line points left side of africa so i cut it out of the upload, cuz i didnt intend to timewarp with my car into the sea

matschaffer commented 6 years ago

/cc @robouden or @Frangible incase they'd like to comment, but my understanding is that the GPS can do this from time to time.

Lugulbanda commented 6 years ago

ah ok, then iam right if i cut it out and it dont do any deal to the upload ?

then next question, in my upload #30031213.log i "DONT" see in the tile map the measuring Points i see in the API Map, there is a huge part of the drive cut of and not present... and my measuring data stops in the middle while it is full shown on the api map ??? and i think i miss annother drive to... wich is uploaded but not present in the tile map

seanbonner commented 6 years ago

The Tilemap and the API map are not pulling from the same data, the tilemap shows tiles that are generated once a day. Wait a few days for the tilemap to show everything you’d uploaded…

-s

Sean Bonner Co-Founder, Director of Global Operations Safecast.org | pgp key

On December 18, 2017 at 2:16:27 PM, Lugulbanda (notifications@github.com) wrote:

ah ok then next question, in my upload #30031213.log i "DONT" see in the tile map the measuring Points i see in the API Map, there is a huge part of the drive cut of and not present...

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Lugulbanda commented 6 years ago

i know sean, but this path was allready uploaded days ago, and i see only half of it in the tilemap, as far i noticed so far, it normaly appears all aprox 1-2 days later after getting approved... and then all, not a partial, i noticed that somehow in new uploads the zoom factors, some parts are missing and come later a few days, but then i have at least one zoom factor wich shows the full path... maby ur right give the googlememory bank more time as 5Days to refresh :) i hope this has nothing to do with the huge mountains and the snowfall that day ...

https://api.safecast.org/en-US/bgeigie_imports/32985 API MAP 1

TILE MAP 2

Frangible commented 6 years ago

Looking into this.

On Sun, Dec 17, 2017 at 10:27 PM, Lugulbanda notifications@github.com wrote:

i know sean, but this path was allready uploaded days ago, and i see only half of it in the tilemap, as far i noticed so far, it normaly appears all aprox 1-2 days later after getting approved... and then all, not a partial, i noticed that somehow in new uploads the zoom factors, some parts are missing and come later a few days, but then i have at least one zoom factor wich shows the full path... maby ur right give the googlememory bank time to refresh :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Safecast/safecastapi/issues/410#issuecomment-352329657, or mute the thread https://github.com/notifications/unsubscribe-auth/ABybJY3Op5UEDApn0WhYgMEsFZCuRcMuks5tBffbgaJpZM4REnHP .

Frangible commented 6 years ago

I looked at your log files from the 16th and I don't see any problems. Thanks for checking though.

On Mon, Dec 18, 2017 at 6:43 AM, Lugulbanda notifications@github.com wrote:

hello nick, could u take a look in the upload from the 16th to, there i have a different problem,

i made 2 lines of measuring, the one with the most points show only 1 path, and the one with the least measuringpoints

show not the second measuring, it shows both of them in the graphic, the first path and the second from the same day....

maybe thats all related, cuz i notice this effects since i cut out the old measurements from the SD Card, to make space on it.

i hope i dont build in a error with that. or i did something wrong with the api and upload option...

best wishes..

Frank

Gesendet: Montag, 18. Dezember 2017 um 09:14 Uhr Von: "Nick Dolezal" notifications@github.com An: Safecast/safecastapi safecastapi@noreply.github.com Cc: Lugulbanda Starlord@gmx.de, Author author@noreply.github.com Betreff: Re: [Safecast/safecastapi] Question about an API UPload (#410)

Looking into this.

On Sun, Dec 17, 2017 at 10:27 PM, Lugulbanda notifications@github.com wrote:

i know sean, but this path was allready uploaded days ago, and i see only half of it in the tilemap, as far i noticed so far, it normaly appears all aprox 1-2 days later after getting approved... and then all, not a partial, i noticed that somehow in new uploads the zoom factors, some parts are missing and come later a few days, but then i have at least one zoom factor wich shows the full path... maby ur right give the googlememory bank time to refresh :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Safecast/safecastapi/issues/410# issuecomment-352329657, or mute the thread https://github.com/notifications/unsubscribe-auth/ ABybJY3Op5UEDApn0WhYgMEsFZCuRcMuks5tBffbgaJpZM4REnHP .

— You are receiving this because you authored the thread.

Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Safecast/safecastapi/issues/410#issuecomment-352428739, or mute the thread https://github.com/notifications/unsubscribe-auth/ABybJbVezlSXIBXQnKVkXcnPpCK-tXlCks5tBmvngaJpZM4REnHP .

Lugulbanda commented 6 years ago

Hi nick, yes but if u look on the api map u see lines of measuring points, and if u uplaod and aprove them to the tile map they are gone..

i now miss 3 complete travels, one in the past i was think i missed to put nano on... long ago.. but those other 2, now as i focused my eye on each travel, and were to travel next, and wait for upload and approval i notice, some data seems to vanish...

i drove 2 different streets, from my home to bonn, and annother different way back from bonn to my home, both lines are uploaded seperatly, boths shown ok in the api map, but only one is showing on the tile map after aproved, the other one is vanished...

what i think is not correct ist the discrepance in measuring poinst its like they switched..

i have a bit the feeling, the upload system only accepts one measuring a day if u do more it eats up some part, or something is filtering it out to prevent double data... wich one is acceptet looks for me like a cash box win...

but that dosent explain me, why i have on one upload a line with lots of measururing points and line also shown in the api map on that day bevore, and the other uploaded one day later , showing the way back + PLUS the allready uploaded Way, with lower measurementpoints, then the map uploaded bevore.

if u dont see a error in the log then the problem must be somewere else after upload and aproval ... API Map first upload https://api.safecast.org/en-US/bgeigie_imports/33057 1730 Measurments. 1a

API MAP second upload, https://api.safecast.org/en-US/bgeigie_imports/33066 617 measurments. 3

TILE MAP 4

should look like : img_2270

Lugulbanda commented 6 years ago

is the same with the map from the start from 13th

https://api.safecast.org/en-US/bgeigie_imports/32985 API MAP 1

TILE MAP 2

should look like :

img_2271

its abslolut certain, this is not a memory bug from api or bgeige or my browser, i waited now from 13 Dezember to 20 December, and the tile map misses still the way from the 13th, its only shown up to town Selters and stopps there makes a blib on town maxsain, and then completly gone not shown and i switched the basemaps and zoomfacors , the map from the 13th looks like only a part of the aproved measurements were uploaded to the tile map, and then the upload broke somehow... special the little blib on city maxain, let me think the upload was disstorted, or damaged and the data got lost...

normaly i mark all citys i see on the map otw i traveld, thus we have many citys and my job in germany my city list is allways very large, can it be that this chauses an negative effekt to the upload? if there is a maximum of citys, u can name in the api metadata, let me know,

**i somehow have the feeling that this is the problem.....

and if i lay my finger on it, i would say, it happens while writing the data to the Google network, something overflows and the hole download breaks, and the rest of the upload goes to the data nirvana land, instead in to google data cave.....

in that case u need a filecheck and get a message that the file is 100% transfered not only partial... or a warning if something went wrong while uploading it to Google..**

Lugulbanda commented 6 years ago

Is there an maximum, ammount of Citys u can insert into the upload? i have the feeling this may be the problem...

matschaffer commented 6 years ago

Thanks @Lugulbanda you too!

As for cities, it's just a text field (see https://github.com/Safecast/safecastapi/blob/master/db/structure.sql#L420) which has unlimited length so I don't think that would cause any sort of gaps.

matschaffer commented 6 years ago

I took a look at https://api.safecast.org/en-US/bgeigie_imports/32985

The measurements (or at least the first ten) seem to have made it to the measurements table intact.

safecast=# select id, value, unit, measurement_import_id, captured_at, md5sum from measurements where md5sum in (select md5sum from bgeigie_logs where bgeigie_import_id = 32985 limit 10);
    id     | value | unit | measurement_import_id |     captured_at     |              md5sum
-----------+-------+------+-----------------------+---------------------+----------------------------------
 101246632 |    31 | cpm  |                 32985 | 2017-12-13 19:05:43 | f15f87db9337c0ffa75112e9e43aecc7
 101246633 |    30 | cpm  |                 32985 | 2017-12-13 19:05:38 | 4732c113dd6210bdf7b5eed40a320d5a
 101246628 |    28 | cpm  |                 32985 | 2017-12-13 19:06:03 | dc4d21bbef70a508ca6c027f5d4add70
 101246625 |    30 | cpm  |                 32985 | 2017-12-13 19:06:18 | 74cb4a1d07aee4ca9899a4e768234e07
 101246631 |    30 | cpm  |                 32985 | 2017-12-13 19:05:48 | 539d22b2109c3b62da0c40a02c888697
 101246627 |    28 | cpm  |                 32985 | 2017-12-13 19:06:08 | 3ef3bb4b6930e6144b8e616c69363980
 101246634 |    33 | cpm  |                 32985 | 2017-12-13 19:05:33 | 72e1cf3d29defe6514031bd834716b8c
 101246626 |    31 | cpm  |                 32985 | 2017-12-13 19:06:13 | cae3ff8d9232f0547c28add33470a05a
 101246629 |    30 | cpm  |                 32985 | 2017-12-13 19:05:58 | 624ad28bafdd3c85c59e5d6801964dcc
 101246630 |    30 | cpm  |                 32985 | 2017-12-13 19:05:53 | 5f55a03450ff36c461652ca42c79cf87
(10 rows)

But that loop around Halbs, Germany doesn't appear in the tilemap.

Hopefully @Frangible can chime in since he knows the tile generation better than I do.

matschaffer commented 6 years ago

They also seem to show up in the metrics geo query. Had to crank up the distance. Still not sure just what that param's scale is like https://api.safecast.org/en-US/measurements?utf8=%E2%9C%93&latitude=50.590157&longitude=7.964230&distance=2000&captured_after=2011-03-10T00%3A00%3A00Z&captured_before=2017-12-29T05%3A28%3A22.812Z&since=&until=&commit=Filter

matschaffer commented 6 years ago

I think we found the discrepancy on the missing points

safecast=# select lines_count from measurement_imports where id = 32985;
-[ RECORD 1 ]-----
lines_count | 2758

safecast=# select count(*) from bgeigie_logs where bgeigie_import_id = 32985;
-[ RECORD 1 ]
count | 1503

Seems that even though the API saw 2758 lines in the file, only 1503 got imported into the logs table (which then feeds the measurements table). I can import the same file locally w/o issue. And I'm not seeing anything in the workers logs. The latest logs are basically empty so I'm curious if worker logging might be broken at the moment.

My prime suspect is https://github.com/Safecast/safecastapi/blob/a0615366728481a272d8169f34555a68ab8e7293/app/models/bgeigie_import.rb#L241 where it'll try to copy lines from bgeigie_logs_tmp to bgeigie_logs but only if the md5sum isn't already in bgeigie_logs (a 65M row table).

The whole process flow is a little funky as well, relying on both a tmp file and a temp table (which run risk of being overwritten mid-process).

It'd take a bit of work, but we could potentially get the points back in by clearing out any existing copy of that data from the log then re-importing.

Would probably be good to get @nokton's input though. We still have the points in the original file so not sure there's a lot of value in mucking with the DB data set to get the points in there.

Seems better to me to spend the time re-working the import processing stuff to ensure we get a proper failure next time there's a discrepancy between counted lines and imported rows.

Lugulbanda commented 6 years ago

a option,

were the new block allways overrides the old dataset, if it is not correctly transfered in the frist run,

and run a file check after transfer wich give back a green light, that everything is ok !

  best wishes

and thx for investigation ^^   

best wishes

Frank

 

Gesendet: Donnerstag, 28. Dezember 2017 um 08:35 Uhr Von: "Mat Schaffer" notifications@github.com An: Safecast/safecastapi safecastapi@noreply.github.com Cc: Lugulbanda Starlord@gmx.de, Mention mention@noreply.github.com Betreff: Re: [Safecast/safecastapi] Data Eating Bug - Question about an API UPload from API after aproval to Google (#410)

I think we found the discrepancy on the missing points

safecast=# select lines_count from measurement_imports where id = 32985; -[ RECORD 1 ]----- lines_count | 2758

safecast=# select count(*) from bgeigie_logs where bgeigie_import_id = 32985; -[ RECORD 1 ] count | 1503

Seems that even though the API saw 2758 lines in the file, only 1503 got imported into the logs table (which then feeds the measurements table). I can import the same file locally w/o issue. And I'm not seeing anything in the workers logs. The latest logs are basically empty so I'm curious if worker logging might be broken at the moment.

My prime suspect is https://github.com/Safecast/safecastapi/blob/a0615366728481a272d8169f34555a68ab8e7293/app/models/bgeigie_import.rb#L241 where it'll try to copy lines from bgeigie_logs_tmp to bgeigie_logs but only if the md5sum isn't already in bgeigie_logs (a 65M row table).

The whole process flow is a little funky as well, relying on both a tmp file and a temp table.

It'd take a bit of work, but we could potentially get the points back in by clearing out any existing copy of that data from the log then re-importing.

Would probably be good to get @nokton's input though. We still have the points in the original file so not sure there's a lot of value in mucking with the DB data set to get the some points in there.

Seems better to me to spend the time re-working the import processing stuff to ensure we get a proper failure next time there's a discrepancy between counted lines and imported rows.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

 

matschaffer commented 6 years ago

Indeed. That “green light” is something I’m thinking. At very least check that we’ve imported all the importable lines. Right now there’s a fair bit of room for silent failure.

Lugulbanda commented 6 years ago

Hello Mat, 

A simple Checksum Test should be easy to implement, cuz thats a common feature as long i know about computers, and u dont have to verify the hole data only the checksum musst be right to lower the traffic.

 

best wishes

matschaffer commented 6 years ago

Yep, that's exactly what's happening on the very far right side of https://github.com/Safecast/safecastapi/blob/a0615366728481a272d8169f34555a68ab8e7293/app/models/bgeigie_import.rb#L241 where it says left join bgeigie_logs bl on bl.md5sum = bt.md5sum where bl.md5sum is null.

The downside is it's not actually surfacing any info about what hits/misses. It just silently skips any matched checksums.

I suspect this issue is a result of that, but until we build better handling we can't say for sure.

Lugulbanda commented 6 years ago

yeah but why isnt there a option to override the half transportet Data with the complet one, no verify Data Repair Check ? missing those 2 long routes, wich i shurly wont drive the next years again is somehow sad...

matschaffer commented 6 years ago

Just not how it was written, really. On the plus side we do actually have the data stored just not in a way it can reach the map.

On Fri, Jan 5, 2018 at 13:34 Lugulbanda notifications@github.com wrote:

yeah but why isnt there a option to override the half transportet Data with the complet one, no verify Data Repair Check ? missing those 2 long routes, wich i shurly wont drive the next years again is somehow sad...

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/Safecast/safecastapi/issues/410#issuecomment-355470882, or mute the thread https://github.com/notifications/unsubscribe-auth/AAACsjOJG_H4RTGZTMBVQudEf3bZN7Baks5tHaZOgaJpZM4REnHP .

--

-Mat

matschaffer.com

Lugulbanda commented 6 years ago

we do actually have the data stored just not in a way it can reach the map.

that sounds like somehow we need a Problemsolver or Problemsolving, data wich is stored away, isnt for use, to anyone ;-)

matschaffer commented 6 years ago

More problem solvers are always appreciated :)

On Tue, Jan 9, 2018 at 8:36 Lugulbanda notifications@github.com wrote:

we do actually have the data stored just not in a way it can reach the map.

that sounds like somehow we need a Problemsolver or Problemsolving, data wich is stored isnt for use to anyone ;-)

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/Safecast/safecastapi/issues/410#issuecomment-356131159, or mute the thread https://github.com/notifications/unsubscribe-auth/AAACsg6-zD07AJQ_2gk-RSNLcTACQ9fxks5tIqaFgaJpZM4REnHP .

--

-Mat

matschaffer.com

Lugulbanda commented 6 years ago

well iam looking forward, to see the missing Data importet to the tile map, some day i hope...

if my understanding is right, the tilemap need a complete reset and new load of the existing data ?

i think iam not the only one with this kind of Problem, so our good map could carry much more data then we see actualy on the map.

Lugulbanda commented 6 years ago

if the missing data is in the datacave, how do we get it out to see it on the tile map? reset of the map with reload of everything? if this Problem happens on my uploads i guess, iam not the only one wich may miss some data, wich is carryed to the server but not to the Tile map.

i cant imagine there isnt a tool, to resend the broken Datastream from the Cave, again to the open tile map. And more important something to check how many Datauploads to the map, are missing by this silent loose Data Error.

i have a bit of feeling the tile map, would show a good bit more missing Data then today, and iam not talking about my missing data alone...

matschaffer commented 6 years ago

I think we have options, but any of them will require time and attention to develop and execute.

I'm sure this is not the only import that has silently failed to import points. There are even a number of other similar github issues I called out in the description of this issue.

I can understand not having your data visible is frustrating, but as a volunteer org it's difficult to say when a problem can be resolved.

I'm hoping we can find someone to pick this up, fix up the processing pipeline to avoid similar future failures, then we can work on how best to address drives that aren't completely copied into the measurements table.

Lugulbanda commented 6 years ago

na, iam long over the frustration point as i notice its noticed :) sorry if my english is not everytime clear, not my native language, just for understanding, i am just a beginner with safecast, i had no workshop which could explained my hundreds of questions, for new people which have no idea about it like me, its hard to understand how the system work, how to bring the data online, i teached most the stuff myself by playing sherloc homes digging the info in the safecast Archives , but if u dig around u end up understand it better every day, sometimes i feel i don't know much, sometimes if feel is struggle around and pick up infos until i find a source wich makes me know a bit more... its not easy special for newbys, special if they had no guide, so much stuff so many projects but no red line to follow, cuz everything is in constant motion, and some red lines end in a red knot, or fell to sleep that even a cry didnt wake up the sleepers...

dont name it frustration, just not knowing the prozess, how it works, i am very happy to add my time to a project for humankind, i was aloooong time unhappy with how citizens were treated with the wisdom over radiation, and trumbling blindfolded around, feeded with information u only heard behind a hand, or whispered, but not much talked in the open. Safecast is for me like homecoming, after a half lifetime of search , filled with a bag of stuff u diged out over years urself, were u can add ur bit to a good Thing,

No not frustrated, more like hypernerd which finally found his tool to dive in, and share the work with everyone wich can access it, iam so happy for ur guys brought out this thing to live :)

the only thing wich frustrated me a long time, was since i saw the suitcase drive video short after the fukoshima event, i was on fire, wanted to know how it works how it goes, and wanted one myself, and waited until last year for the device , i cant even imagine how many customers i visited in this time how many miles and streets i could have visited until today if i got it earlyer , i was so frustrated that i missed Genf, so on fire as the next should be in Berlin and never came, and while time rolled by, i saw the map in my nation was growing without me, and i had absolutly no access to the Safecaters in my nation to share whisdom, but now i have my tool and, iam so sorry if i dig out problems wich make work, because they need to be fixed ;-)

Safecast is like a Fishernet u may be catsched by it its hard to find the way out to its origin, and still today i struggle because in feel i dont know were the chatroom is...

iam not frustrated anymore, iam happy with every tiny dot i see on the map knowing, its my Tiny little foodstep on the map for all. : )

matschaffer commented 6 years ago

Sounds good @Lugulbanda :)

As for chatroom, slack is the place if you want to get involved with API development work. @seanbonner or @robouden should be able to send you an invite.

If you're interested in exploring the problem, that's awesome. Though I will say this particular one will be tricky without some knowledge of ruby & postgresql first.

https://github.com/Safecast/safecastapi/#development has links to the set up pages if you want to try running the app and uploading your files there to see how they progress through the tables.