Closed shankari closed 5 years ago
I can specify a filter for the resources, but it is not clear how complicated the regex can be. Let's try op - that seems to match all the compute resources:
aws-op-*
,open-
,em-open
aws-op-vpc
Loaded public data using
$ ./e-mission-py.bash bin/debug/load_multi_timeline_for_range.py -d 1 -v 10000 -p interscity-user /code/transfer/odc-usp-2017/user 2>&1 | tee /code/transfer/load_odc-usp.log
$ ./e-mission-py.bash bin/debug/load_multi_timeline_for_range.py -d 1 -v 10000 -p test-phone-user /code/transfer/public_data/dump 2>&1 | tee /code/transfer/load_test_phones.log
While running the analysis pipeline, encountered
Traceback (most recent call last):
File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 118, in save_cleaned_segments_for_timeline
filtered_trip = get_filtered_trip(ts, trip)
File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 184, in get_filtered_trip
(filtered_section_entry, point_list) = get_filtered_section(filtered_trip_entry, section)
File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 294, in get_filtered_section
overridden_mode = get_overriden_mode(section.data, filtered_section_data, with_speeds_df)
File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 496, in get_overriden_mode
(raw_section_data.trip_id, cleaned_section_data.trip_id))
Next error
Cleaning and resampling failed for user 04bc05b3-4c2d-4138-bfbb-7eb45dce709a
Traceback (most recent call last):
File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 81, in clean_and_resample
last_raw_place = save_cleaned_segments_for_ts(user_id, time_query.startTs, time_query.endTs)
File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 107, in save_cleaned_segments_for_ts
return save_cleaned_segments_for_timeline(user_id, tl)
File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 133, in save_cleaned_segments_for_timeline
(last_cleaned_place, filtered_tl) = create_and_link_timeline(tl, user_id, trip_map)
File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 909, in create_and_link_timeline
link_trip_start(ts, curr_cleaned_trip, curr_cleaned_start_place, raw_start_place)
File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 963, in link_trip_start
ts, cleaned_trip_data, cleaned_start_place_data)
File "/Users/shankari/e-mission/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 1101, in _fix_squished_place_mismatch
assert False
AssertionError
On further investigation, the mismatch is an extra 0 at the end. That means that we have an extra reconstructed location when we re-query. Investigating further...
[0.0, 0.7874599418825768, 1.6700112310347082, 1.6700083407891528, 2.8850795868410497, 3.612041604838898, 3.490672881889288, 0.13185994542031884, 2.2685465126699547, 3.7544348542123562, 6.241591837653119, 6.513255262776884, 4.303879054109581, 1.0062625290710587, 11.85287083926253, 3.796945102876728, 4.4811855331967525, 4.9256719058025284, 5.33856586893377, 18.01958238254823, 15.337483277942944, 9.817315391362875, 0.11879683970093337, 1.6133514703039964, 5.981594572128506, 5.297948876043496, 2.0231189531744405, 0.06495441900389091, 0.010212308476347307, 0.008375715841469174, 0.1576887649267426, 0.8043412604987412, 0.0]
[0.0, 0.7874599418825768, 1.6700112310347082, 1.6700083407891528, 2.8850795868410497, 3.612041604838898, 3.490672881889288, 0.13185994542031884, 2.2685465126699547, 3.7544348542123562, 6.241591837653119, 6.513255262776884, 4.303879054109581, 1.0062625290710587, 11.85287083926253, 3.796945102876728, 4.4811855331967525, 4.9256719058025284, 5.33856586893377, 18.01958238254823, 15.337483277942944, 9.817315391362875, 0.11879683970093337, 1.6133514703039964, 5.981594572128506, 5.297948876043496, 2.0231189531744405, 0.06495441900389091, 0.010212308476347307, 0.008375715841469174, 0.1576887649267426, 0.8043412604987412]
Ok, so this is because the last two points are in fact the same point, although the object IDs are different.
2017-12-16 10:47:37,954:DEBUG:140737149985728:fix_squished_place: after recomputing for validation, with_speeds_df.tail = _id ts fmt_time \
28 5a356a46f6858fcefd76d29e 1.452972e+09 2016-01-16T11:16:28.072312-08:00
29 5a356a46f6858fcefd76d29f 1.452972e+09 2016-01-16T11:16:58.072312-08:00
30 5a356a46f6858fcefd76d2a0 1.452972e+09 2016-01-16T11:17:28.072312-08:00
31 5a356a46f6858fcefd76d2a1 1.452972e+09 2016-01-16T11:17:38-08:00
32 5a356a46f6858fcefd76d2a3 1.452972e+09 2016-01-16T11:17:38-08:00
latitude longitude distance speed
28 37.391324 -122.060069 0.306369 0.010212
29 37.391326 -122.060070 0.251271 0.008376
30 37.391293 -122.060036 4.730663 0.157689
31 37.391226 -122.060066 7.985249 0.804341
32 37.391226 -122.060066 0.000000 0.000000
Let's see when the second object id was created.
2017-12-16 10:47:34,809:DEBUG:140737149985728:About to store 24 points for section 5a356a44f6858fcefd76d270
2017-12-16 10:47:34,814:DEBUG:140737149985728:Returning multi_result.inserted_ids = [ObjectId('5a356a46f6858fcefd76d2a3'), ObjectId('5a356a46f6858fcefd76d2a4'), ObjectId('5a356a46f6858fcefd76d2a5'), ObjectId('5a356a46f6858fcefd76d2a6'), ObjectId('5a356a46f6858fcefd76d2a7'), ObjectId('5a356a46f6858fcefd76d2a8'), ObjectId('5a356a46f6858fcefd76d2a9'), ObjectId('5a356a46f6858fcefd76d2aa'), ObjectId('5a356a46f6858fcefd76d2ab'), ObjectId('5a356a46f6858fcefd76d2ac')]... of length 24
It was created for section 5a356a44f6858fcefd76d270 and appears to be the first point for that section. The currently retrieved head is
2017-12-16 10:47:37,948:DEBUG:140737149985728:fix_squished_place: after recomputing for validation, with_speeds_df.head = _id ts fmt_time \
0 5a356a49f6858fcefd76d2f0 1.452971e+09 2016-01-16T11:02:28.072312-08:00
1 5a356a46f6858fcefd76d283 1.452971e+09 2016-01-16T11:02:58.072312-08:00
2 5a356a46f6858fcefd76d284 1.452971e+09 2016-01-16T11:03:28.072312-08:00
3 5a356a46f6858fcefd76d285 1.452971e+09 2016-01-16T11:03:58.072312-08:00
4 5a356a46f6858fcefd76d286 1.452971e+09 2016-01-16T11:04:28.072312-08:00
latitude longitude distance speed
0 37.391099 -122.086379 0.000000 0.000000
1 37.390900 -122.086471 23.623798 0.787460
2 37.391260 -122.086131 50.100337 1.670011
3 37.391621 -122.085790 50.100250 1.670008
4 37.390948 -122.086282 86.552388 2.885080
which is was created for section 5a356a44f6858fcefd76d26f not 5a356a44f6858fcefd76d270
2017-12-16 10:47:34,803:DEBUG:140737149985728:About to store 31 points for section 5a356a44f6858fcefd76d26f
2017-12-16 10:47:34,809:DEBUG:140737149985728:Returning multi_result.inserted_ids = [ObjectId('5a356a46f6858fcefd76d283'), ObjectId('5a356a46f6858fcefd76d284'), ObjectId('5a356a46f6858fcefd76d285'), ObjectId('5a356a46f6858fcefd76d286'), ObjectId('5a356a46f6858fcefd76d287'), ObjectId('5a356a46f6858fcefd76d288'), ObjectId('5a356a46f6858fcefd76d289'), ObjectId('5a356a46f6858fcefd76d28a'), ObjectId('5a356a46f6858fcefd76d28b'), ObjectId('5a356a46f6858fcefd76d28c')]... of length 31
Let us look at the start time and end time for these two sections.
5a356a44f6858fcefd76d26f
is
Inserting entry Entry({'_id': ObjectId('5a356a44f6858fcefd76d26f')
'user_id': UUID('04bc05b3-4c2d-4138-bfbb-7eb45dce709a')
'metadata': Metadata({'key': 'segmentation/raw_section'
'data': Section('start_ts': 1452971053.281
'start_fmt_time': '2016-01-16T11:04:13.281000-08:00'
'end_ts': 1452971858
'end_fmt_time': '2016-01-16T11:17:38-08:00'
'start_loc': {"coordinates": [-122.0856179 37.3918034] "type": "Point"}
'end_loc': {"coordinates": [-122.06006589 37.39122569] "type": "Point"}
'duration': 804.7190001010895
'source': 'SmoothedHighConfidenceMotion'
'sensed_mode': 0})}) into timeseries
5a356a44f6858fcefd76d270
is
2017-12-16 10:47:32,190:DEBUG:140737149985728:Inserting entry Entry({'_id': ObjectId('5a356a44f6858fcefd76d270')
'user_id': UUID('04bc05b3-4c2d-4138-bfbb-7eb45dce709a')
'metadata': Metadata({'key': 'segmentation/raw_section'
'data': Section({'start_ts': 1452971858
'start_fmt_time': '2016-01-16T11:17:38-08:00'
'end_ts': 1452972533
'end_fmt_time': '2016-01-16T11:28:53-08:00'
'start_loc': {"coordinates": [-122.06006589 37.39122569] "type": "Point"}
'end_loc': {"coordinates": [-122.058218 37.39069702] "type": "Point"}
'duration': 675
'source': 'SmoothedHighConfidenceMotion'
'sensed_mode': 2 })}) into timeseries
So basically the last point of the first section = first point of the section section, so the query brings up the additional point. That is not such a bad thing, and I don't think that it corresponds to an error in the diary generation. Let's verify from the UI.
Trip view | Section view |
---|---|
so to fix it, we need to strip out the last row (and potentially the last few rows!) if their speed and distance are zero. Given that we are now coming out of beta, maybe we should also ignore these asserts instead of failing during them. Others running the system will not find it easy to view the errors.
We can't use the last few rows because there might only be < few points, and then the first point will also be deleted. We will check for strictly the last row.
We can still log the errors, so on our own server(s), we can investigate errors. And the tests will still fail on regression because the ground truth won't match. Or we could have an extra validation flag that is on by default, and turned off on production builds. Let's go with the validation flag for now.
There is one more final issue left. We get these warnings while running the multi-processed pipeline.
/Users/shankari/OSS/anaconda/envs/emission/lib/python3.6/site-packages/pymongo/topology.py:145: UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#pymongo-fork-safe>
"MongoClient opened before fork. Create MongoClient "
/Users/shankari/OSS/anaconda/envs/emission/lib/python3.6/site-packages/pymongo/topology.py:145: UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#pymongo-fork-safe>
"MongoClient opened before fork. Create MongoClient "
/Users/shankari/OSS/anaconda/envs/emission/lib/python3.6/site-packages/pymongo/topology.py:145: UserWarning: MongoClient opened before fork. Create MongoClient with connect=False, or create client after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#pymongo-fork-safe>
"MongoClient opened before fork. Create MongoClient "
The error message is pretty self-explanatory. Unfortunately, simply passing connect=False
is not sufficient, because we do access the database in the scheduler before the fork.
sel_uuids = esdu.get_all_uuids()
sel_jobs = [(u, get_count(u)) for u in sel_uuids]
https://stackoverflow.com/questions/46441918/mongoclient-opened-before-fork-create-mongoclient https://stackoverflow.com/questions/44133435/mongoengine-and-dealing-with-userwarning-mongoclient-opened-before-fork-creat
Looking at the multiprocess
documentation, it looks like using fork
, the default on unix,
https://docs.python.org/3/library/multiprocessing.html
is the problem.
All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.
Using spawn
would seem to obviate this at the expense of performance.
The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.
Let's see if this fixes it. Much easier that a more complicated restructuring.
That seems to have fixed it. We connect to the database 4 times - one for the master and 3 for the three forks. Yay! The performance hit is not too bad because we spawn only three-four subprocesses once at the beginning of every run.
storage not configured, falling back to sample, default configuration
Connecting to database URL localhost
debug not configured, falling back to sample, default configuration
storage not configured, falling back to sample, default configuration
Connecting to database URL localhost
storage not configured, falling back to sample, default configuration
Connecting to database URL localhost
storage not configured, falling back to sample, default configuration
Connecting to database URL localhost
Ran into an error similar https://github.com/e-mission/e-mission-server/issues/530#issuecomment-352209647 to again. This is clearly with the fix (https://github.com/shankari/e-mission-server/commit/dd7a18d24df6f98367d85ecaf23644c4f9f7a2a4) because I see the additional logging introduced in it, and apparently we did drop the last entry. And the previous entries were non-zero.
Need to see if it is reproducible.
2017-12-17 15:20:43,359:DEBUG:140047176312576:Found zero-speed, distance entries may nee
d to drop last entry _id ts
fmt_time \
28 5a368b0cae2e7353fd79279b 1.452972e+09 2016-01-16T11:16:28.072312-08:00
29 5a368b0cae2e7353fd79279c 1.452972e+09 2016-01-16T11:16:58.072312-08:00
30 5a368b0cae2e7353fd79279d 1.452972e+09 2016-01-16T11:17:28.072312-08:00
31 5a368b0cae2e7353fd7927a0 1.452972e+09 2016-01-16T11:17:38-08:00
32 5a368b0cae2e7353fd79279e 1.452972e+09 2016-01-16T11:17:38-08:00
latitude longitude distance speed
28 37.391324 -122.060069 0.306369 0.010212
29 37.391326 -122.060070 0.251271 0.008376
30 37.391293 -122.060036 4.730663 0.157689
31 37.391226 -122.060066 7.985249 0.804341
32 37.391226 -122.060066 0.000000 0.000000
2017-12-17 15:20:43,359:DEBUG:140047176312576:delta_ts = 0.0, dropping last entry
2017-12-17 15:20:43,362:ERROR:140047176312576:[0.0, 0.603780040015428, 1.6700112310347082, 1.6700083407891528, 2.8850795868410497, 3.612041604838898, 3.490672881889288, 0.13185994542031884, 2.2685465126699547, 3.7544348542123562] != [0.0, 0.603780040015428, 1.6700112310347082, 1.6700083407891528, 2.8850795868410497, 3.612041604838898, 3.490672881889288, 0.13185994542031884, 2.2685465126699547, 3.7544348542123562]
2017-12-17 15:20:43,363:ERROR:140047176312576:Cleaning and resampling failed for user 86842c35-da28-32ed-a90e-2da6663c5c73
Traceback (most recent call last):
File "/code/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 88, in clean_and_resample
last_raw_place = save_cleaned_segments_for_ts(user_id, time_query.startTs, time_query.endTs)
File "/code/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 114, in save_cleaned_segments_for_ts
return save_cleaned_segments_for_timeline(user_id, tl)
File "/code/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 140, in save_cleaned_segments_for_timeline
(last_cleaned_place, filtered_tl) = create_and_link_timeline(tl, user_id, trip_map)
File "/code/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 916, in create_and_link_timeline
link_trip_start(ts, curr_cleaned_trip, curr_cleaned_start_place, raw_start_place)
File "/code/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 970, in link_trip_start
ts, cleaned_trip_data, cleaned_start_place_data)
File "/code/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 1127, in _fix_squished_place_mismatch
assert False
AssertionError
Changed the labels for the test phones to be more meaningful. Based on https://github.com/e-mission/e-mission-server/commit/310d045e8fc4eeb7524f4967ef297903d25a0b6d
$ ./e-mission-py.bash bin/historical/migrations/reset_test_phone_emails.py Connecting to database URL mongodb://open-data-rw:((22-remember-WRITTEN-lone-78))@[2600:1f18:45b0:7f01:e024:7991:e056:101e]/admin?authMechanism=SCRAM-SHA-1 {'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True} {'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True} {'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True} {'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True} {'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True} {'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True} {'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True} {'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True} {'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True} {'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True} {'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True} {'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True}
wrt https://github.com/e-mission/e-mission-server/issues/530#issuecomment-352340874 it is reproducible. This is very weird because this data is for Jan 16
fmt_time \
0 5a3e7b4df6858f4dd712518c 1.452971e+09 2016-01-16T11:02:28.072312-08:00
1 5a3e7b4bf6858f4dd712511f 1.452971e+09 2016-01-16T11:02:58.072312-08:00
2 5a3e7b4bf6858f4dd7125120 1.452971e+09 2016-01-16T11:03:28.072312-08:00
3 5a3e7b4bf6858f4dd7125121 1.452971e+09 2016-01-16T11:03:58.072312-08:00
4 5a3e7b4bf6858f4dd7125122 1.452971e+09 2016-01-16T11:04:28.072312-08:00
And the previous fix was also for Jan 16, and it is a part of the unit test cases and is passing.
Hm. Maybe the previous error was for a different user (04bc05b3-4c2d-4138-bfbb-7eb45dce709a
)
https://github.com/e-mission/e-mission-server/issues/530#issuecomment-352202072
wrt https://github.com/e-mission/e-mission-server/issues/530#issuecomment-353733626 it is reproducible on alternate runs. This is so freaky.
$ ./e-mission-py.bash bin/purge_analysis_database.py
$ ./e-mission-py.bash bin/debug/intake_single_user.py -e test-assertion-not-fixed (PASSES)
$ ./e-mission-py.bash bin/purge_analysis_database.py
$ ./e-mission-py.bash bin/debug/intake_single_user.py -e test-assertion-not-fixed (FAILS)
No idea why that is happening.
ok, so this happens because sometimes the point from the other section appears to have an earlier write_ts
than the point from our section (see point with index = 31, unsure why). So when we drop the point, we drop the point from our section and not the other section.
2017-12-23 22:39:50,998:DEBUG:140737149985728:Found zero-speed, distance entries may need to drop last entry _id section ts \
28 5a3f4bb5f6858f52c88c8c32 5a3f4bb5f6858f52c88c8c16 1.452972e+09
29 5a3f4bb5f6858f52c88c8c33 5a3f4bb5f6858f52c88c8c16 1.452972e+09
30 5a3f4bb5f6858f52c88c8c34 5a3f4bb5f6858f52c88c8c16 1.452972e+09
31 5a3f4bb5f6858f52c88c8c37 5a3f4bb5f6858f52c88c8c36 1.452972e+09
32 5a3f4bb5f6858f52c88c8c35 5a3f4bb5f6858f52c88c8c16 1.452972e+09
We can fix this by dropping the point that is from a different section. But then the last point has a wrong calculated speed, so we need to recompute it too.
So we have two choices:
Going to try to drop the entry before we recompute the speeds since that feels more principled than removing the last entry. If that doesn't work, going to option (2)
No errors so far on the public server, so will start migrating the main e-mission server!!
Goal is to follow the instructions exactly from https://github.com/e-mission/e-mission-server/wiki/Multi-layer-stack
default-route
.After those fixes, was able to ping google.com. Need to update the cloudformation template...
ipv6 support in cloudformation appears to be extremely limited. https://forums.aws.amazon.com/click.jspa?searchID=8492229&messageID=751582
There is a somewhat unintuitive method to turn on ipv6 for a VPC. https://github.com/trailofbits/algo/commit/f3d0abf149191dbec2f2f3095c90c92fd53da0e7
But specifying an ipv6 CIDR for the two subnets seems pretty complicated, and there is no option to allow AWS to assign it by default, similar to the UI. The complex method here works for a single subnet but not for multiple ones, since it is not clear how to rewrite the ipv6 subnet from the VPC in two different ways. https://forums.aws.amazon.com/thread.jspa?messageID=773600򼷠
For the manually created VPC, we see
VPC CIDR: 2600:1f18:45b0:7f00::/56
public subnet CIDR: 2600:1f18:45b0:7f00::/64
private subnet CIDR: 2600:1f18:45b0:7f01::/64
Note that going from the VPC -> public subnet is simple because we essentially only have to remove the prefix. But going from VPC -> private subnet is more complex because we have to replace the last non-empty token with one that is off by one.
So I am going to focus on getting the ipv4 working out of the box, and then documenting (hopefully small) manual fixes for ipv6 later.
Setup default server group with:
No access control set up (yet!) for the database. We will set that up if we choose to use the notebook server. For now, ipython on either analysis or webapp will work just fine.
Next, I need to copy all the data over from the primary server. Note that we are changing storage versions (https://docs.mongodb.com/manual/tutorial/change-standalone-wiredtiger/).
Before I copy the data over, let me delete the open data which is already in the open server. But before any deletions, I want to take a backup.
So the sequence will be:
bin/debug/purge_multi_timeline_for_range.py
Updates along with notes on the progress of the plan outlined in https://github.com/e-mission/e-mission-server/issues/530#issuecomment-354682128
Current database size is
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 7.8G 5.6G 1.9G 76% /
/dev/xvdd 7.8G 412M 7.0G 6% /home/e-mission
/dev/mapper/xvdb 37G 14G 22G 39% /mnt
/dev/mapper/xvdc 37G 19G 17G 54% /mnt/logs
/dev/mapper/xvdf 3.0T 143G 2.7T 5% /mnt/e-mission-primary-db
So let's create a ~ 300GB volume to leave some wiggle room.
The original webapp is in us-east-1c
, while we are now creating stuff in us-east-1d
. So we're going to have to use snapshots to migrate to us-east-1d
.
Created new 300GB volume, named it migration-volume
and attached it to the webapp as /dev/sdg
Logged in to the instance and mounted the volume
$ sudo mkdir -p /transfer
$ sudo mkfs.ext4 /dev/xvdg
$ sudo chown ubuntu:ubuntu /transfer
$ sudo mount /dev/xvdg /transfer
Before dumping, get some stats on the number of entries of various types
In [878]: edb.get_uuid_db().find().count()
Out[878]: 516
In [879]: edb.get_tim
edb.get_timeseries_db edb.get_timeseries_error_db
In [879]: edb.get_timeseries_db().find().count()
Out[879]: 53718353
In [880]: edb.get_analysis_timeseries_db().find().count()
Out[880]: 16573823
In [883]: len(edb.get_pipeline_state_db().find().distinct('user_id'))
Out[883]: 487
In [884]: len(edb.get_usercache_db().find().distinct('user_id'))
Out[884]: 502
Tried running distinct
on the timeseries, but took too long.
Actually dumping now.
$ mongodump --out /transfer/
2018-01-01T23:30:09.731+0000 all dbs
2018-01-01T23:30:09.731+0000 DATABASE: Stage_database to /transfer/Stage_database
2018-01-02T00:51:07.047+0000 DATABASE: admin to /transfer/admin
Took ~ 1.5 hours.
Now kill the server so we don't get any new data during the migration.
$ ps -aef | grep python
ubuntu 1130 1 0 2016 ? 01:26:55 /home/ubuntu/anaconda/bin/python /home/ubuntu/anaconda/bin/supervisord
ubuntu 21856 21855 0 2016 pts/3 00:00:15 /home/ubuntu/anaconda/bin/python /home/ubuntu/anaconda/bin/ipython
root 28163 28162 0 2017 ? 17:48:57 /home/ubuntu/anaconda/bin/python emission/net/api/cfc_webapp.py
ubuntu 32664 32663 0 2016 pts/4 00:05:22 /home/ubuntu/anaconda/bin/python /home/ubuntu/anaconda/bin/ipython
$ kill -9 1130
$ sudo kill -9 28163
$ ps -aef | grep python
ubuntu 5577 2466 0 00:54 pts/0 00:00:00 grep --color=auto python
ubuntu 21856 21855 0 2016 pts/3 00:00:15 /home/ubuntu/anaconda/bin/python /home/ubuntu/anaconda/bin/ipython
ubuntu 32664 32663 0 2016 pts/4 00:05:22 /home/ubuntu/anaconda/bin/python /home/ubuntu/anaconda/bin/ipython
Server is down. Got a "503 Service Unavailable" error.
(emission) $ ./e-mission-py.bash bin/debug/purge_multi_timeline_for_range.py /tmp/odc-usp-2017/user_
Now the uuid entries should be 516 - 29 = 487. Which they are!
In [891]: edb.get_uuid_db().find().count()
Out[891]: 487
However, the number of entries in the pipeline state is unchanged.
In [892]: len(edb.get_pipeline_state_db().find().distinct('user_id'))
Out[892]: 487
Let's update the script to delete those too (https://github.com/e-mission/e-mission-server/pull/557).
$ ./e-mission-py.bash bin/debug/purge_multi_timeline_for_range.py --pipeline-purge /tmp/odc-usp-2017/user_
INFO:root:==================================================
INFO:root:Deleting data from file /tmp/odc-usp-2017/user_dcdb5f74-071a-4e5b-a954-e613c5b46e5d.gz
...
INFO:root:For uuid dcdb5f74-071a-4e5b-a954-e613c5b46e5d, deleting entries from the pipeline_state_db
INFO:root:result = {u'ok': 1, u'n': 7}
After this, pipeline states are also lowered by 29 entries, as expected.
In [901]: len(edb.get_pipeline_state_db().find().distinct('user_id'))
Out[901]: 458
$ ./e-mission-py.bash bin/debug/purge_multi_timeline_for_range.py --pipeline-purge /tmp/public_data/dump_
Now, all counts go down by a further ~ 13 entries.
In [904]: edb.get_uuid_db().find().count()
Out[904]: 475
In [905]: edb.get_timeseries_db().find().count()
Out[905]: 36644553
In [906]: edb.get_analysis_timeseries_db().find().count()
Out[906]: 15782219
In [908]: len(edb.get_pipeline_state_db().find().distinct('user_id'))
Out[908]: 446
But we also don't want to delete incoming config entries (like the consent) from the usercache. I think they will be pushed again, but better to avoid running into that issue in the first place. When we did run the consent stuff, we used to save the config using putDocument
.
def save_config(user_id, uc, time_query, module_name):
config_fn = get_configurator("emission.analysis.configs.%s" % module_name)
logging.debug("for %s, config_fn = %s" % (module_name, config_fn))
(config, last_processed_ts) = config_fn(user_id, time_query)
logging.debug("for %s, config is %s" % (user_id, config))
if config is not None:
uc.putDocument("config/%s" % module_name, config)
return last_processed_ts
else:
return 0
That stores the value as a document
. When we get the incoming entries, they are of type rw-document
. So we should be able to delete all documents
from the three config types. Let's do some queries to ensure that this is OK.
We should have at least some rw-document
entries since we had users sign up after I turned off the pipeline. And we should have at least some document
entries since we were running the pipeline before.
Hm. There are 740 rw-document
s and 161 document
s. Neither of which matches the number of users.
In [895]: edb.get_usercache_db().find({"metadata.key": "config/consent", "metadata.type": "rw-document"}).count()
Out[895]: 740
In [495]: edb.get_usercache_db().find({"metadata.key": "config/consent", "metadata.type": "document"}).count()
Out[495]: 161
The timeseries has way more rw-document
s. Basically, it looks like rw-document
s are generated over and over.
In [896]: edb.get_timeseries_db().find({"metadata.key": "config/consent", "metadata.type": "rw-document"}).count()
Out[896]: 1564
In [897]: edb.get_timeseries_db().find({"metadata.key": "config/consent", "metadata.type": "document"}).count()
Out[897]: 0
And the counts of the three config objects are different. Wut!!
In [898]: edb.get_usercache_db().find({"metadata.key": "config/sensor_config", "metadata.type": "rw-document"}).count()
Out[898]: 500
In [899]: edb.get_usercache_db().find({"metadata.key": "config/sync_config", "metadata.type": "rw-document"}).count()
Out[899]: 160
No time to investigate this further, so for now, let us just delete all document
entries for all the three types of config
objects - config/sensor_config
, config/sync_config
, config/consent
.
In [903]: edb.get_usercache_db().find({"metadata.type": "document", "metadata.key": {"$in": ['config/consent', 'config/sensor_config', 'config/sync_config']}}).count()
Out[903]: 239
In [910]: edb.get_usercache_db().remove({"metadata.type": "document", "metadata.key": {"$in": ['config/consent', 'config/sensor_config', 'config/sync_config']}})
Out[910]: {u'n': 239, u'ok': 1}
system.indexes.bson
that seems to have entries for all the collections in the database. So it is not clear that it is safe to do so, and we should really delete the collections before re-dumping.In [911]: edb.get_alternatives_db().remove()
Out[911]: {u'n': 101146, u'ok': 1}
In [912]: edb.get_client_db().remove()
Out[912]: {u'n': 3, u'ok': 1}
In [913]: edb.get_common_place_db().remove()
Out[913]: {u'n': 0, u'ok': 1}
In [914]: edb.get_common_trip_db().remove()
Out[914]: {u'n': 0, u'ok': 1}
In [915]: edb.get_pending_signup_db().remove()
Out[915]: {u'n': 25, u'ok': 1}
In [916]: edb._get_current_db().Stage_place.remove()
Out[916]: {u'n': 0, u'ok': 1}
In [917]: edb.get_routeCluster_db().remove()
Out[917]: {u'n': 90, u'ok': 1}
In [918]: edb._get_current_db().Stage_routeDistanceMatrix.remove()
Out[918]: {u'n': 7, u'ok': 1}
In [919]: edb._get_current_db().Stage_section_new.remove()
Out[919]: {u'n': 0, u'ok': 1}
In [920]: edb._get_current_db().Stage_stop.remove()
Out[920]: {u'n': 0, u'ok': 1}
In [921]: edb._get_current_db().Stage_trip_new.remove()
Out[921]: {u'n': 0, u'ok': 1}
In [922]: edb._get_current_db().Stage_user_moves_access.remove()
Out[922]: {u'n': 118, u'ok': 1}
In [923]: edb._get_current_db().Stage_utility_models.remove()
Out[923]: {u'n': 36, u'ok': 1}
In [924]: edb._get_current_db().Stage_Worktime.remove()
Out[924]: {u'n': 2662, u'ok': 1}
The script inserts the new stats entries, so I can't just see whether the _id
matches. I can try to correlate against ts
entries, though.
Didn't seem to have run it for client entries.
In [929]: entry_1 = edb.get_client_stats_db_backup().find_one(); entry_1
Out[929]:
{u'_id': ObjectId('543c8f4688f663587d21cd4e'),
u'client_app_version': u'2.0.1',
u'client_os_version': u'4.4.2',
u'reading': u'4.0',
u'reported_ts': 1413254982.201671,
u'stat': u'battery_level',
u'ts': u'1413254944995',
u'user': UUID('f8fee20c-0f32-359d-ba75-bce97a7ac83b')}
In [934]: edb.get_timeseries_db().find_one({"metadata.key": "background/battery", "metadata.write_ts": entry_1["reported_ts"]})
In [935]: edb.get_timeseries_db().find_one({"metadata.key": "background/battery", "data.ts": int(entry_1["ts"])})
In [936]: edb.get_timeseries_db().find_one({"metadata.key": "background/battery", "data.ts": entry_1["ts"]})
Checking counts.
In [951]: edb.get_client_stats_db_backup().find({"user": UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}).count()
Out[951]: 2556
In [952]: edb.get_timeseries_db().find({"user_id": UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea'), "metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}}).count()
Out[952]: 13974
The oldest entry in the stats timeseries was
In [956]: list(edb.get_timeseries_db().find({"user_id": UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea'), "metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}}).sort("data.ts", -1).limit(1))
Out[956]:
[{u'_id': ObjectId('581b4c5588f6630386d0ea58'),
u'data': {u'client_app_version': u'1.2.0',
u'client_os_version': u'4.4.2',
u'name': u'sync_launched',
u'reading': None,
u'ts': 1478184016437.0},
u'metadata': {u'key': u'stats/client_nav_event',
u'platform': u'server',
u'time_zone': u'America/Los_Angeles',
u'write_ts': 1478184016437.0},
u'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}]
Unfortunately, the timestamp is wrong because it is stored in ms instead of sec, but after fixing, it is ~ Oct 30, which is consistent with the switch (https://github.com/e-mission/e-mission-server/commit/7487c82578e8933f4da8f9d3fa3522c102906c81)
In [957]: arrow.get(1478184016437.0)
...
ValueError: year is out of range
In [958]: arrow.get(1478184016.437)
Out[958]: <Arrow [2016-11-03T14:40:16.437000+00:00]>
Sorting the other way gives us a valid entry, but one which is still fairly old because of the incorrect ts conversion.
In [959]: list(edb.get_timeseries_db().find({"user_id": UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea'), "metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}}).sort("data.ts", 1).limit(1))
Out[959]:
[{u'_id': ObjectId('581b8771cb17471ac0b16595'),
u'data': {u'client_app_version': u'1.4.0',
u'client_os_version': u'4.4.2',
u'name': u'sync_launched',
u'reading': -1,
u'ts': 1478187644.294},
u'metadata': {u'key': u'stats/client_nav_event',
u'platform': u'android',
u'read_ts': 1478206545.480631,
u'time_zone': u'America/Los_Angeles',
u'type': u'message',
u'write_fmt_time': u'2016-11-03T08:40:44.544000-07:00',
u'write_local_dt': {u'day': 3,
u'hour': 8,
u'minute': 40,
u'month': 11,
u'second': 44,
u'timezone': u'America/Los_Angeles',
u'weekday': 3,
u'year': 2016},
u'write_ts': 1478187644.544},
u'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}]
In [960]: arrow.get(1478187644.294)
Out[960]: <Arrow [2016-11-03T15:40:44.294000+00:00]>
But the entries in the stats DB end around 2016-11-02
so it is fairly clear that the conversion never happened.
In [969]: list(edb.get_client_stats_db_backup().find({"user": UUID("96af3842-d5fb-4f13-aea0-726efaeba6ea")}).sort("ts", -1).limit(3))
In [970]: arrow.get(1478058014.172)
Out[970]: <Arrow [2016-11-02T03:40:14.172000+00:00]>
In [971]: arrow.get(1478058014.172)
Out[971]: <Arrow [2016-11-02T03:40:14.172000+00:00]>
In [972]: arrow.get(1478054410.652)
Out[972]: <Arrow [2016-11-02T02:40:10.652000+00:00]>
So:
In [975]: edb.get_timeseries_db().find({"user_id": UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea'), "metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}}, {"data.ts": {"$gt": now.timestamp}}).count()
Out[975]: 13974
In [984]: invalid_entries = 0
In [985]: for e in edb.get_client_stats_db_backup().find():
if type(e['ts']) == str or type(e['ts']) == unicode:
invalid_entries = invalid_entries + 1
.....:
In [986]: invalid_entries
Out[986]: 321986
While checking stat counts, discovered that all my data was deleted since I was part of the open data collection, and purging deletes all data, not just data in the time range specified!! Aborting, restoring and then re-deleting invalid collections.
Let's fix the stats entries before restoring. That allows us to test out the stats conversion in a dry run before making the actual changes.
Fixing stats entries:
In [66]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}, "data.ts": {"$gt": now.timestamp}}).count()
Out[66]: 3
In [68]: for e in edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}, "data.ts": {"$gt": now.timestamp}}):
edb.get_timeseries_db().update({"_id": e["_id"]},
{"$set": {"data.ts": float(e["data"]["ts"])/1000,
"metadata.write_ts": float(e["metadata"]["write_ts"])/1000}})
In [69]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]},
"data.ts": {"$gt": now.timestamp}}).count()
Out[69]: 0
Before:
[{u'_id': ObjectId('59c787af88f6636e03100808'),
u'data': {u'client_app_version': u'2.1',
u'client_os_version': u'5.0.1',
u'name': u'sync_pull_list_size',
u'reading': 0.0,
u'ts': 1506248617116.0},
u'metadata': {u'key': u'stats/client_time',
u'platform': u'server',
u'time_zone': u'America/Los_Angeles',
u'write_ts': 1506248617116.0},
u'user_id': UUID('850d2923-cc02-4e6c-939e-c762dd3ba189')},
{u'_id': ObjectId('59c787af88f6636e0310080a'),
u'data': {u'client_app_version': u'2.1',
u'client_os_version': u'5.0.1',
u'name': u'sync_push_list_size',
u'reading': 0.0,
u'ts': 1506248617116.0},
u'metadata': {u'key': u'stats/client_time',
u'platform': u'server',
u'time_zone': u'America/Los_Angeles',
u'write_ts': 1506248617116.0},
u'user_id': UUID('850d2923-cc02-4e6c-939e-c762dd3ba189')},
{u'_id': ObjectId('59c787af88f6636e0310080c'),
u'data': {u'client_app_version': u'2.1',
u'client_os_version': u'5.0.1',
u'name': u'sync_launched',
u'reading': None,
u'ts': 1506248617116.0},
u'metadata': {u'key': u'stats/client_nav_event',
u'platform': u'server',
u'time_zone': u'America/Los_Angeles',
u'write_ts': 1506248617116.0},
u'user_id': UUID('850d2923-cc02-4e6c-939e-c762dd3ba189')}]
After:
[{'_id': ObjectId('59c787af88f6636e03100808'),
'data': {'client_app_version': '2.1',
'client_os_version': '5.0.1',
'name': 'sync_pull_list_size',
'reading': 0.0,
'ts': 1506248617.116},
'metadata': {'key': 'stats/client_time',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_ts': 1506248617.116},
'user_id': UUID('850d2923-cc02-4e6c-939e-c762dd3ba189')},
{'_id': ObjectId('59c787af88f6636e0310080a'),
'data': {'client_app_version': '2.1',
'client_os_version': '5.0.1',
'name': 'sync_push_list_size',
'reading': 0.0,
'ts': 1506248617.116},
'metadata': {'key': 'stats/client_time',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_ts': 1506248617.116},
'user_id': UUID('850d2923-cc02-4e6c-939e-c762dd3ba189')},
{'_id': ObjectId('59c787af88f6636e0310080c'),
'data': {'client_app_version': '2.1',
'client_os_version': '5.0.1',
'name': 'sync_launched',
'reading': None,
'ts': 1506248617.116},
'metadata': {'key': 'stats/client_nav_event',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_ts': 1506248617.116},
'user_id': UUID('850d2923-cc02-4e6c-939e-c762dd3ba189')}]
Timestamp check:
In [72]: arrow.get(1506248617116.0)
ValueError: year 49701 is out of range
In [73]: arrow.get(1506248617.116)
Out[73]: <Arrow [2017-09-24T10:23:37.116000+00:00]>
Applying this fix to the real entries....
In [515]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}, "data.ts": {"$gt": now.timestamp}}).count()
Out[515]: 1717191
In [516]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/server_api_error", "stats/server_api_time"]}, "data.ts": {"$gt": now.timestamp}}).count() Out[516]: 0
In [517]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/pipeline_error", "stats/pipeline_time"]}, "data.ts": {"$gt": now.timestamp}}).count()
Out[517]: 0
In [518]: for e in edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}, "data.ts": {"$gt": now.timestamp}}):
.....: edb.get_timeseries_db().update({"_id": e["_id"]},
.....: {"$set": {"data.ts": float(e["data"]["ts"])/1000,
.....: "metadata.write_ts": float(e["metadata"]["write_ts"])/1000}})
.....:
In [519]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}, "data.ts": {"$gt": now.timestamp}}).count()
Out[519]: 0
verification on client data
In [80]: list(edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/client_time"
...: , "stats/client_nav_event", "stats/client_error"]}, "data.ts": {"$gt": now.time
...: stamp}}).limit(3))
Out[80]: []
In [81]: list(edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/client_time"
...: , "stats/client_nav_event", "stats/client_error"]}}).limit(3))
Out[81]:
[{'_id': ObjectId('5a4bacbbf6858f499c29ac94'),
'data': {'client_app_version': '2.1',
'client_os_version': '8.3',
'name': 'result_display_failed',
'reading': None,
'ts': 1435258760.849617},
'metadata': {'key': 'stats/client_error',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_ts': 1435259103.039456},
'user_id': UUID('b0d937d0-70ef-305e-9563-440369012b39')},
{'_id': ObjectId('5a4bacbbf6858f499c29ac95'),
'data': {'client_app_version': '2.1',
'client_os_version': '8.3',
'name': 'result_display_failed',
'reading': None,
'ts': 1435258798.327425},
'metadata': {'key': 'stats/client_error',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_ts': 1435259103.039456},
'user_id': UUID('b0d937d0-70ef-305e-9563-440369012b39')},
{'_id': ObjectId('5a4baaa1f6858f499c26508f'),
'data': {'client_app_version': '2.0.1',
'client_os_version': '4.4.4',
'name': 'sync_pull_list_size',
'reading': 7.0,
'ts': 1413466406.571},
'metadata': {'key': 'stats/client_time',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_ts': 1413466453.32879},
'user_id': UUID('8fea63f7-1f4a-382f-b710-118568652329')}]
In [83]: [(arrow.get(e["data"]["ts"]), arrow.get(e["metadata"]["write_ts"])) for e in sa
...: mple_client_entries]
Out[83]:
[(<Arrow [2015-06-25T18:59:20.849617+00:00]>,
<Arrow [2015-06-25T19:05:03.039456+00:00]>),
(<Arrow [2015-06-25T18:59:58.327425+00:00]>,
<Arrow [2015-06-25T19:05:03.039456+00:00]>),
(<Arrow [2014-10-16T13:33:26.571000+00:00]>,
<Arrow [2014-10-16T13:34:13.328790+00:00]>)]
verification on server data
In [84]: sample_server_entries = list(edb.get_timeseries_db().find({"metadata.key": {"$i
...: n": ["stats/server_api_time", "stats/server_api_error"]}}).limit(3))
In [85]: sample_server_entries
Out[85]:
[{'_id': ObjectId('5a4bb12df6858f499c303de5'),
'data': {'name': 'POST_/tripManager/getUnclassifiedSections',
'reading': 0.1927809715270996,
'ts': 1417725020.540355},
'metadata': {'key': 'stats/server_api_time',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_ts': 1417725020.540355},
'user_id': UUID('3a307244-ecf1-3e6e-a9a7-3aaf101b40fa')},
{'_id': ObjectId('5a4bb12df6858f499c303de6'),
'data': {'name': 'POST_/tripManager/getUnclassifiedSections',
'reading': 0.20390009880065918,
'ts': 1417725072.799086},
'metadata': {'key': 'stats/server_api_time',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_ts': 1417725072.799086},
'user_id': UUID('3a307244-ecf1-3e6e-a9a7-3aaf101b40fa')},
{'_id': ObjectId('5a4bb12df6858f499c303de7'),
'data': {'name': 'POST_/tripManager/getUnclassifiedSections',
'reading': 0.21010494232177734,
'ts': 1417725104.509851},
'metadata': {'key': 'stats/server_api_time',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_ts': 1417725104.509851},
'user_id': UUID('3a307244-ecf1-3e6e-a9a7-3aaf101b40fa')}]
In [86]: [(arrow.get(e["data"]["ts"]), arrow.get(e["metadata"]["write_ts"])) for e in sample_server_entries]
Out[86]:
[(<Arrow [2014-12-04T20:30:20.540355+00:00]>,
<Arrow [2014-12-04T20:30:20.540355+00:00]>),
(<Arrow [2014-12-04T20:31:12.799086+00:00]>,
<Arrow [2014-12-04T20:31:12.799086+00:00]>),
(<Arrow [2014-12-04T20:31:44.509851+00:00]>,
<Arrow [2014-12-04T20:31:44.509851+00:00]>)]
Ran on server, seemed to work
In [999]: entry_1 = edb.get_client_stats_db_backup().find_one()
In [1000]: edb.get_timeseries_db().find_one({"metadata.key": "background/battery", "data.ts": float(entry_1["ts"])/1000})
Out[1000]:
{u'_id': ObjectId('5a4bbc7188f663417cbd343b'),
u'data': {u'battery_level_pct': 4.0, u'ts': 1413254944.995},
u'metadata': {u'key': u'background/battery',
u'platform': u'server',
u'time_zone': u'America/Los_Angeles',
u'write_ts': 1413254982.201671},
u'user_id': UUID('f8fee20c-0f32-359d-ba75-bce97a7ac83b')}
With all the fixes, does the sorting work?
In [1002]: list(edb.get_timeseries_db().find({"user_id": UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea'), "metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}}).sort("data.ts", -1).limit(1))
Out[1002]:
[{u'_id': ObjectId('59a0c57dcb17471ac08bc0c6'),
u'data': {u'client_app_version': u'2.3.0',
u'client_os_version': u'7.0',
u'name': u'sync_duration',
u'reading': 6.459,
u'ts': 1503704747.497},
u'metadata': {u'key': u'stats/client_time',
u'platform': u'android',
u'read_ts': 0,
u'time_zone': u'America/Los_Angeles',
u'type': u'message',
u'write_fmt_time': u'2017-08-25T16:45:47.499000-07:00',
u'write_ts': 1503704747.499},
u'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}]
In [1003]: list(edb.get_timeseries_db().find({"user_id": UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea'), "metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}}).sort("data.ts", 1).limit(1))
Out[1003]:
[{u'_id': ObjectId('5a4bc0f688f663417cc52d32'),
u'data': {u'client_app_version': u'1.0.0',
u'client_os_version': u'4.4.2',
u'name': u'sync_duration',
u'reading': 5638.0,
u'ts': 1474414.807961},
u'metadata': {u'key': u'stats/client_time',
u'platform': u'server',
u'time_zone': u'America/Los_Angeles',
u'write_ts': 1474422045.947868},
u'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}]
In [1004]: arrow.get(1474422045.947868)
Out[1004]: <Arrow [2016-09-21T01:40:45.947868+00:00]>
Let's cross-check with the client stats
In [1005]: list(edb.get_client_stats_db_backup().find({"user": UUID("96af3842-d5fb-4f13-aea0-726efaeba6ea")}).sort("ts", -1).limit(1))
Out[1005]:
[{u'_id': ObjectId('5819602288f6630624d4b02a'),
u'client_app_version': u'1.2.0',
u'client_os_version': u'4.4.2',
u'reading': u'55.0',
u'reported_ts': 1478058018.196038,
u'stat': u'battery_level',
u'ts': 1478058014.172,
u'user': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}]
In [1007]: arrow.get(1478058014.172)
Out[1007]: <Arrow [2016-11-02T03:40:14.172000+00:00]>
In [1009]: list(edb.get_client_stats_db_backup().find({"user": UUID("96af3842-d5fb-4f13-aea0-726efaeba6ea")}).sort("ts", 1).limit(1))
Out[1009]:
[{u'_id': ObjectId('57e1e51d88f663307e959ca5'),
u'client_app_version': u'1.0.0',
u'client_os_version': u'4.4.2',
u'reading': u'5638',
u'reported_ts': 1474422045.947868,
u'stat': u'sync_duration',
u'ts': 1474414807.961,
u'user': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}]
In [1014]: arrow.get(1474422045.947868)
Out[1014]: <Arrow [2016-09-21T01:40:45.947868+00:00]>
So far, so good. The write_ts
works just fine. However, the ts
is broken because in this case, the ts
is already a second even in the client_stats_db. It looks like the ts
is only incorrect iff it is stored as a string.
This code seems to work.
In [91]: for e in edb.get_client_stats_db_backup().find().limit(100):
e_ts = e['ts']
if type(e_ts) == str:
print("string, after converting ms -> sec %s" % arrow.get(float(e_ts)/1000))
else:
print("float, no real need to convert %s" % arrow.get(float(e_ts)))
string, after converting ms -> sec 2014-10-14T02:49:04.995000+00:00
string, after converting ms -> sec 2014-10-14T02:49:06.390000+00:00
string, after converting ms -> sec 2014-10-14T02:49:06.390000+00:00
string, after converting ms -> sec 2014-10-14T02:49:06.390000+00:00
....
float, no real need to convert 2016-09-22T11:40:11.586000+00:00
float, no real need to convert 2016-09-22T12:40:13.430000+00:00
float, no real need to convert 2016-09-22T13:40:12.354000+00:00
float, no real need to convert 2016-09-22T14:40:13.595000+00:00
float, no real need to convert 2016-09-22T15:40:12.583000+00:00
float, no real need to convert 2016-09-22T16:40:14.230000+00:00
float, no real need to convert 2016-09-22T17:40:13.844000+00:00
Let's also store the metadata.fmt_time
and metadata.local_date
fields to be consistent with new data.
Re-ran locally - let's re-confirm. Works!
In [98]: list(edb.get_timeseries_db().find({"user_id": UUID('96af3842-d5fb-4f13-aea0-726
...: efaeba6ea'), "metadata.key": {"$in": ["stats/client_time", "stats/client_nav_ev
...: ent", "stats/client_error"]}}).sort("data.ts", -1).limit(1))
Out[98]:
[{'_id': ObjectId('5a4bef34f6858f4aeaa1cfde'),
'data': {'client_app_version': '1.2.0',
'client_os_version': '4.4.2',
'name': 'sync_launched',
'reading': None,
'ts': 1478058014.172},
'metadata': {'key': 'stats/client_nav_event',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_fmt_time': '2016-11-01T20:40:18.196038-07:00',
'write_local_dt': {'day': 1,
'hour': 20,
'minute': 40,
'month': 11,
'second': 18,
'timezone': 'America/Los_Angeles',
'weekday': 1,
'year': 2016},
'write_ts': 1478058018.196038},
'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}]
In [99]: list(edb.get_timeseries_db().find({"user_id": UUID('96af3842-d5fb-4f13-aea0-726
...: efaeba6ea'), "metadata.key": {"$in": ["stats/client_time", "stats/client_nav_ev
...: ent", "stats/client_error"]}}).sort("data.ts", 1).limit(1))
Out[99]:
[{'_id': ObjectId('5a4becb8f6858f4aea9fda54'),
'data': {'client_app_version': '1.0.0',
'client_os_version': '4.4.2',
'name': 'sync_duration',
'reading': 5638.0,
'ts': 1474414807.961},
'metadata': {'key': 'stats/client_time',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_fmt_time': '2016-09-20T18:40:45.947868-07:00',
'write_local_dt': {'day': 20,
'hour': 18,
'minute': 40,
'month': 9,
'second': 45,
'timezone': 'America/Los_Angeles',
'weekday': 1,
'year': 2016},
'write_ts': 1474422045.947868},
'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}]
In [100]: arrow.get(1474414807.961)
Out[100]: <Arrow [2016-09-20T23:40:07.961000+00:00]>
In [101]: arrow.get(1478058014.172)
Out[101]: <Arrow [2016-11-02T03:40:14.172000+00:00]>
Note that the oldest entry is from 2016-09-20
. I suspect that is because I re-created the UUIDs at that time and didn't change the uuids in the stats databases, only the timeseries. Let's confirm.
The oldest entries overall are from 2015.
In [107]: first_10_entries = list(edb.get_timeseries_db().find({"metadata.key": {"$in":
...: ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}}).sort(
...: "data.ts", 1).limit(10))
In [109]: [e["metadata"]["write_fmt_time"] for e in first_10_entries]
Out[109]:
['2015-06-03T09:57:27.061417-07:00',
'2015-06-03T09:57:27.061417-07:00',
'2015-06-09T09:42:59.953313-07:00',
'2015-06-09T09:42:59.953313-07:00',
'2015-06-11T07:19:21.271020-07:00',
'2015-06-11T07:19:21.271020-07:00',
'2015-06-09T09:42:59.953313-07:00',
'2015-06-11T07:19:21.271020-07:00',
'2015-06-03T09:57:27.061417-07:00',
'2015-06-03T09:57:27.061417-07:00']
The corresponding UUID is
In [114]: [e["user_id"] for e in first_10_entries]
Out[114]:
[UUID('077bbe67-7693-3a09-9b3c-c8ba935e46ee'),
...
which doesn't exist in the uuid_db
In [1019]: edb.get_uuid_db().find_one({"uuid": UUID("077bbe67-7693-3a09-9b3c-c8ba935e46ee")})
In [1020]:
In contrast, if we try the last 10 entries
In [115]: [e["user_id"] for e in last_10_entries]
Out[115]:
[UUID('3d37573b-2e74-496d-a09d-a0a3f05c2467'),
UUID('6560950c-4ddb-41fc-8801-b9197d30f54d'),
UUID('365fc61a-033c-49ec-8f47-6fc5fbc36151'),
UUID('4cd90e57-a116-448f-8cba-6a48f68fb554'),
UUID('8f1a6737-e8ee-4769-a1a4-da1948d886b6'),
UUID('8f1a6737-e8ee-4769-a1a4-da1948d886b6'),
UUID('8f1a6737-e8ee-4769-a1a4-da1948d886b6'),
UUID('8f1a6737-e8ee-4769-a1a4-da1948d886b6'),
UUID('1ad0a2d9-10c3-47d4-9b5e-36cded2d563d'),
UUID('1d2c74be-3480-4137-b708-9789f1489d1b')]
They are very much there.
In [1020]: edb.get_uuid_db().find_one({"uuid": UUID("3d37573b-2e74-496d-a09d-a0a3f05c2467")})
Out[1020]:
{u'_id': ObjectId('543c8a2239e59673fd9fb9dd'),
u'update_ts': datetime.datetime(2016, 7, 27, 16, 32, 52, 494000),
u'user_email': u'<redacted>',
u'uuid': UUID('3d37573b-2e74-496d-a09d-a0a3f05c2467')}
ok so the first n entries will not exist in the uuid db and cannot be retrieved without a bunch of additional work. It is not clear that we need that mapping anyway since we will primarily use this data to evaluate performance. And we can defer the additional work to if/when we actually need it. This has already taken up a bunch of time!
ok so let's restart mongodb with the new data directory.
$ grep -i dbpath /etc/mongod.conf
# dbpath=/mnt/e-mission-primary-db/mongodb
dbpath=/mnt/e-mission-primary-db/mongodb.restore
$ sudo service mongod restart
As expected, there is no data.
In [1023]: edb.get_uuid_db().find().count()
Out[1023]: 0
In [1024]: edb.get_timeseries_db().find().count()
Out[1024]: 0
In [1025]: edb.get_analysis_timeseries_db().find().count()
Out[1025]: 0
Now start restoring. Took much longer than the dump.
ubuntu@ip-10-203-173-119:~$ mongorestore /transfer/original-jan-1
connected to: 127.0.0.1
2018-01-03T00:20:32.239+0000 /transfer/original-jan-1/Backup_database/Stage_Sections.bson
2018-01-03T05:00:46.118+0000 Creating index: { key: { _id: 1 }, name: "_id_", ns: "Stage_database.Stage_section_new" }
Now, we dump client and server stats values from the stats database for the test phones. First we need to get the bin values for the uuids.
> db.Stage_uuids.find({"user_email": "ucb.sdb.android.1@gmail.com"})
{ "_id" : ObjectId("565872797d65cb39ee9e89aa"), "update_ts" : ISODate("2017-08-30T00:36:29.359Z"), "uuid" : BinData(3,"5HFxHr0UPb6Atpx9kuzClg=="), "user_email" : "ucb.sdb.android.1@gmail.com" }
> db.Stage_uuids.find({"user_email": "ucb.sdb.android.2@gmail.com"})
{ "_id" : ObjectId("565872827d65cb39ee9e89ac"), "update_ts" : ISODate("2017-08-04T20:31:27.378Z"), "uuid" : BinData(3,"/XtMLiyLO/qU8NHj7L1ftw=="), "user_email" : "ucb.sdb.android.2@gmail.com" }
> db.Stage_uuids.find({"user_email": "ucb.sdb.android.3@gmail.com"})
{ "_id" : ObjectId("565872877d65cb39ee9e89ae"), "update_ts" : ISODate("2017-08-28T21:50:47.581Z"), "uuid" : BinData(3,"hoQsNdooMu2pDi2mZjxccw=="), "user_email" : "ucb.sdb.android.3@gmail.com" }
> db.Stage_uuids.find({"user_email": "ucb.sdb.androi.4@gmail.com"})
{ "_id" : ObjectId("56fad3dd1268798022665a00"), "update_ts" : ISODate("2017-08-30T00:15:17.402Z"), "uuid" : BinData(3,"O8D5H3ZgNKKwBVw5lZijaQ=="), "user_email" : "ucb.sdb.androi.4@gmail.com" }
> db.Stage_uuids.find({"user_email": "ucb.sdb.iphone.1@gmail.com"})
{ "_id" : ObjectId("5530d20e077c9e50eb0aa61b"), "update_ts" : ISODate("2017-08-25T22:31:33.765Z"), "uuid" : BinData(3,"B54PGsRAPXyw594WD3SONQ=="), "user_email" : "ucb.sdb.iphone.1@gmail.com" }
> db.Stage_uuids.find({"user_email": "ucb.sdb.iphone.2@gmail.com"})
{ "_id" : ObjectId("56555db67d65cb39ee9e1299"), "update_ts" : ISODate("2017-10-01T04:21:49.675Z"), "uuid" : BinData(3,"x2oEh35aOxekSUe+Zms29g=="), "user_email" : "ucb.sdb.iphone.2@gmail.com" }
> db.Stage_uuids.find({"user_email": "cub.sdb.iphone.3@gmail.com"})
{ "_id" : ObjectId("565942347d65cb39ee9eab29"), "update_ts" : ISODate("2017-10-01T04:26:21.266Z"), "uuid" : BinData(3,"xSi80qiLPoK+Yu9PI5aWeg=="), "user_email" : "cub.sdb.iphone.3@gmail.com" }
> db.Stage_uuids.find({"user_email": "ucb.sdb.iphone.4@gmail.com"})
{ "_id" : ObjectId("56faec5a126879802266adc5"), "update_ts" : ISODate("2017-10-01T04:24:58.701Z"), "uuid" : BinData(3,"lecHJ6BOPjO3/jSrGRlPiw=="), "user_email" : "ucb.sdb.iphone.4@gmail.com" }
> db.Stage_uuids.find({"user_email": "nexus7itu01@gmail.com"})
{ "_id" : ObjectId("57e4e958cb17471ac05d7bef"), "update_ts" : ISODate("2016-09-23T08:35:36.545Z"), "uuid" : BinData(3,"cJaAaNulQGyOJgm1SNoOSw=="), "user_email" : "nexus7itu01@gmail.com" }
> db.Stage_uuids.find({"user_email": "nexus7itu02@gmail.com"})
{ "_id" : ObjectId("57e4e888cb17471ac05d7bed"), "update_ts" : ISODate("2016-09-23T08:32:08.502Z"), "uuid" : BinData(3,"ZWFDH9TBTg+UiasRkDQftw=="), "user_email" : "nexus7itu02@gmail.com" }
> db.Stage_uuids.find({"user_email": "motoeitu01@gmail.com"})
{ "_id" : ObjectId("57e11e84cb17471ac05a01e0"), "update_ts" : ISODate("2016-09-20T11:33:24.528Z"), "uuid" : BinData(3,"ks9YQK9ZQAyrcrqz3N94GA=="), "user_email" : "motoeitu01@gmail.com" }
> db.Stage_uuids.find({"user_email": "motoeitu02@gmail.com"})
{ "_id" : ObjectId("57e14471cb17471ac05a0690"), "update_ts" : ISODate("2016-09-20T14:15:13.443Z"), "uuid" : BinData(3,"k+ihzDIfT6mMPEaShmjkXQ=="), "user_email" : "motoeitu02@gmail.com" }
Next, we mongodump based on the bin values.
client:
$ mongodump -d Stage_database -c Stage_client_stats -o /transfer/public_phone_stats -q '{"user": {"$in": [{"$binary": "5HFxHr0UPb6Atpx9kuzClg==", "$type":"03"}, {"$binary": "/XtMLiyLO/qU8NHj7L1ftw==", "$type":"03"}, {"$binary": "hoQsNdooMu2pDi2mZjxccw==", "$type":"03"}, {"$binary": "O8D5H3ZgNKKwBVw5lZijaQ==", "$type":"03"}, {"$binary": "B54PGsRAPXyw594WD3SONQ==", "$type":"03"}, {"$binary": "x2oEh35aOxekSUe+Zms29g==", "$type":"03"}, {"$binary": "xSi80qiLPoK+Yu9PI5aWeg==", "$type":"03"}, {"$binary": "lecHJ6BOPjO3/jSrGRlPiw==", "$type":"03"}, {"$binary": "cJaAaNulQGyOJgm1SNoOSw==", "$type":"03"}, {"$binary": "ZWFDH9TBTg+UiasRkDQftw==", "$type":"03"}, {"$binary": "ks9YQK9ZQAyrcrqz3N94GA==", "$type":"03"}, {"$binary": "k+ihzDIfT6mMPEaShmjkXQ==", "$type":"03"}]}}'
connected to: 127.0.0.1
2018-01-03T06:06:52.363+0000 DATABASE: Stage_database to /transfer/public_phone_stats/Stage_database
2018-01-03T06:06:52.363+0000 Stage_database.Stage_client_stats to /transfer/public_phone_stats/Stage_database/Stage_client_stats.bson
2018-01-03T06:06:52.782+0000 72599 documents
2018-01-03T06:06:52.782+0000 Metadata for Stage_database.Stage_client_stats to /transfer/public_phone_stats/Stage_database/Stage_client_stats.metadata.json
server:
$ mongodump -d Stage_database -c Stage_server_stats -o /transfer/public_phone_stats -q '{"user": {"$in": [{"$binary": "5HFxHr0UPb6Atpx9kuzClg==", "$type":"03"}, {"$binary": "/XtMLiyLO/qU8NHj7L1ftw==", "$type":"03"}, {"$binary": "hoQsNdooMu2pDi2mZjxccw==", "$type":"03"}, {"$binary": "O8D5H3ZgNKKwBVw5lZijaQ==", "$type":"03"}, {"$binary": "B54PGsRAPXyw594WD3SONQ==", "$type":"03"}, {"$binary": "x2oEh35aOxekSUe+Zms29g==", "$type":"03"}, {"$binary": "xSi80qiLPoK+Yu9PI5aWeg==", "$type":"03"}, {"$binary": "lecHJ6BOPjO3/jSrGRlPiw==", "$type":"03"}, {"$binary": "cJaAaNulQGyOJgm1SNoOSw==", "$type":"03"}, {"$binary": "ZWFDH9TBTg+UiasRkDQftw==", "$type":"03"}, {"$binary": "ks9YQK9ZQAyrcrqz3N94GA==", "$type":"03"}, {"$binary": "k+ihzDIfT6mMPEaShmjkXQ==", "$type":"03"}]}}'
connected to: 127.0.0.1
2018-01-03T06:08:11.594+0000 DATABASE: Stage_database to /transfer/public_phone_stats/Stage_database
2018-01-03T06:08:11.595+0000 Stage_database.Stage_server_stats to /transfer/public_phone_stats/Stage_database/Stage_server_stats.bson
2018-01-03T06:08:11.889+0000 52593 documents
2018-01-03T06:08:11.889+0000 Metadata for Stage_database.Stage_server_stats to /transfer/public_phone_stats/Stage_database/Stage_server_stats.metadata.json
Pull the data locally and load it to confirm that we have the relevant data. Confirmed.
In [3]: len(edb.get_client_stats_db_backup().find().distinct("user"))
Out[3]: 12
In [4]: len(edb.get_server_stats_db_backup().find().distinct("user"))
Out[4]: 12
Ok so now back to cleaning up data (based on https://github.com/e-mission/e-mission-server/issues/530#issuecomment-354711521).
$ ./e-mission-py.bash bin/historical/migrations/stats_from_db_to_ts.py
for e in edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}, "data.ts": {"$gt": now.timestamp}}):
edb.get_timeseries_db().update({"_id": e["_id"]},
{"$set": {"data.ts": float(e["data"]["ts"])/1000,
"metadata.write_ts": float(e["metadata"]["write_ts"])/1000}})
validate
messed up entries have been fixed
In [1039]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}, "data.ts": {"$gt": now.timestamp}}).count()
Out[1039]: 0
Correlation between stats db and timeseries db exists and timestamp is valid
In [1040]: entry_1 = edb.get_client_stats_db_backup().find_one()
In [1041]: edb.get_timeseries_db().find_one({"metadata.key": "background/battery", "data.ts": float(entry_1["ts"])/1000})
Out[1041]:
{u'_id': ObjectId('5a4c7b5c88f663668630d290'),
u'data': {u'battery_level_pct': 4.0, u'ts': 1413254944.995},
u'metadata': {u'key': u'background/battery',
u'platform': u'server',
u'time_zone': u'America/Los_Angeles',
u'write_fmt_time': u'2014-10-13T19:49:42.201671-07:00',
u'write_ts': 1413254982.201671},
u'user_id': UUID('f8fee20c-0f32-359d-ba75-bce97a7ac83b')}
In [1044]: arrow.get(1413254944.995)
Out[1044]: <Arrow [2014-10-14T02:49:04.995000+00:00]>
In [1042]: list(edb.get_timeseries_db().find({"user_id": UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea'), "metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}}).sort("data.ts", -1).limit(1))
Out[1042]:
[{u'_id': ObjectId('59a0c57dcb17471ac08bc0c6'),
u'data': {u'client_app_version': u'2.3.0',
u'client_os_version': u'7.0',
u'name': u'sync_duration',
u'reading': 6.459,
u'ts': 1503704747.497},
u'metadata': {u'key': u'stats/client_time',
u'platform': u'android',
u'read_ts': 0,
u'time_zone': u'America/Los_Angeles',
u'type': u'message',
u'write_fmt_time': u'2017-08-25T16:45:47.499000-07:00',
u'write_ts': 1503704747.499},
u'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}]
In [1045]: arrow.get(1503704747.497)
Out[1045]: <Arrow [2017-08-25T23:45:47.497000+00:00]>
In [1047]: list(edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}}).sort("data.ts", -1).limit(1))
Out[1047]:
[{u'_id': ObjectId('59efe782cb17471ac0cecb5a'),
u'data': {u'client_app_version': u'2.4.0',
u'client_os_version': u'10.3.3',
u'name': u'sync_launched',
u'reading': -1,
u'ts': 1508894590.151824},
u'metadata': {u'key': u'stats/client_nav_event',
u'platform': u'ios',
u'plugin': u'none',
u'read_ts': 0,
u'time_zone': u'America/Los_Angeles',
u'type': u'message',
u'write_fmt_time': u'2017-10-24T18:23:10.152289-07:00',
u'write_local_dt': {u'day': 24,
u'hour': 18,
u'minute': 23,
u'month': 10,
u'second': 10,
u'timezone': u'America/Los_Angeles',
u'weekday': 1,
u'year': 2017},
u'write_ts': 1508894590.152289},
u'user_id': UUID('7161343e-551e-4213-be75-3b82e1ce2448')}]
In [1048]: arrow.get(1508894590.151824)
Out[1048]: <Arrow [2017-10-25T01:23:10.151824+00:00]>
In [1043]: list(edb.get_timeseries_db().find({"user_id": UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea'), "metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}}).sort("data.ts", 1).limit(1))
Out[1043]:
[{u'_id': ObjectId('5a4c829288f663668638cb87'),
u'data': {u'client_app_version': u'1.0.0',
u'client_os_version': u'4.4.2',
u'name': u'sync_duration',
u'reading': 5638.0,
u'ts': 1474414807.961},
u'metadata': {u'key': u'stats/client_time',
u'platform': u'server',
u'time_zone': u'America/Los_Angeles',
u'write_fmt_time': u'2016-09-20T18:40:45.947868-07:00',
u'write_local_dt': {u'day': 20,
u'hour': 18,
u'minute': 40,
u'month': 9,
u'second': 45,
u'timezone': u'America/Los_Angeles',
u'weekday': 1,
u'year': 2016},
u'write_ts': 1474422045.947868},
u'user_id': UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea')}]
In [1046]: arrow.get(1474414807.961)
Out[1046]: <Arrow [2016-09-20T23:40:07.961000+00:00]>
In [1049]: list(edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}}).sort("data.ts", 1).limit(1))
Out[1049]:
[{u'_id': ObjectId('5a4c7e6b88f663668633e5dd'),
u'data': {u'client_app_version': u'2.1',
u'client_os_version': u'4.4.2',
u'name': u'confirmlist_auth_not_done',
u'reading': None,
u'ts': 315965026.452},
u'metadata': {u'key': u'stats/client_nav_event',
u'platform': u'server',
u'time_zone': u'America/Los_Angeles',
u'write_fmt_time': u'2015-06-03T09:57:27.061417-07:00',
u'write_local_dt': {u'day': 3,
u'hour': 9,
u'minute': 57,
u'month': 6,
u'second': 27,
u'timezone': u'America/Los_Angeles',
u'weekday': 2,
u'year': 2015},
u'write_ts': 1433350647.061417},
u'user_id': UUID('077bbe67-7693-3a09-9b3c-c8ba935e46ee')}]
In [1050]: arrow.get(315965026.452)
Out[1050]: <Arrow [1980-01-06T00:03:46.452000+00:00]>
This is not just an invalid conversion, though, because trying to convert it back to seconds does not work.
```
In [1051]: arrow.get(315965026452)
ValueError: year is out of range
```
There are 13 such entries and they are all from user UUID('077bbe67-7693-3a09-9b3c-c8ba935e46ee')
In [1067]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}, "data.ts": {"$lt": ts_2000.timestamp}}).count()
Out[1067]: 13
In [1070]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error"]}, "data.ts": {"$lt": ts_2000.timestamp}}).distinct("user_id")
Out[1070]: [UUID('077bbe67-7693-3a09-9b3c-c8ba935e46ee')]
There are 131 entries for this user in the client stats DB
In [1071]: edb.get_client_stats_db_backup().find({"user": UUID('077bbe67-7693-3a09-9b3c-c8ba935e46ee')}).count()
Out[1071]: 151
Let's try to match based on the reported_ts. Bingo! The entry does indeed have an invalid ts.
In [1073]: edb.get_client_stats_db_backup().find_one({"user": UUID('077bbe67-7693-3a09-9b3c-c8ba935e46ee'), "reported_ts": 1433350647.061417})
Out[1073]:
{u'_id': ObjectId('556f31f788f6636f49a1b05a'),
u'client_app_version': u'2.1',
u'client_os_version': u'4.4.2',
u'reading': u'0.0',
u'reported_ts': 1433350647.061417,
u'stat': u'battery_level',
u'ts': u'315965055221',
u'user': UUID('077bbe67-7693-3a09-9b3c-c8ba935e46ee')}
In [1075]: arrow.get(315965055221)
ValueError: year is out of range
There are a bunch of other entries with the same user and reported_ts, but fewer than the entries reported in the timeseries.
In [1074]: edb.get_client_stats_db_backup().find({"user": UUID('077bbe67-7693-3a09-9b3c-c8ba935e46ee'), "reported_ts": 1433350647.061417}).count()
Out[1074]: 37
I bet the others are battery_level, similar to the above.
In [1076]: edb.get_timeseries_db().find({"data.ts": {"$lt": ts_2000.timestamp}}).count()
Out[1076]: 762
In [1077]: edb.get_timeseries_db().find({"data.ts": {"$lt": ts_2000.timestamp}}).distinct("user_id")
Out[1077]:
[UUID('077bbe67-7693-3a09-9b3c-c8ba935e46ee'),
UUID('99b29da2-989d-4e54-a211-284bde1d362d'),
UUID('693a42f1-6f00-497e-8b40-a8339fd5af8d'),
UUID('3d37573b-2e74-496d-a09d-a0a3f05c2467'),
UUID('ee6dadba-53b6-421f-94d1-27bc96e023cf'),
UUID('0109c47b-e640-411e-8d19-e481c52d7130'),
UUID('6560950c-4ddb-41fc-8801-b9197d30f54d'),
UUID('b82804b8-4e49-43a0-99d1-9d1da20ec1d3'),
UUID('9d906275-8072-42d4-8dd2-3670e63e0f6e'),
UUID('6af5afdf-b1d9-4ea7-9f10-2bddb8a0ecb3'),
UUID('6a415e67-9025-4f29-b520-f0c5a43c8bb6'),
UUID('a61349c6-0cc9-4902-9f13-d4236a630ad5'),
UUID('cfbf03dc-6e3e-40bd-90de-d19d14613e47'),
UUID('be47f46a-ce3a-4ad8-b81f-d3daa7955e95'),
UUID('5109e62d-2152-481b-8d26-2cb8d8cc1f23'),
UUID('f14272fe-1433-4430-b1ec-3f37dfdde5bf'),
UUID('abf4ed3a-a018-4f40-90c5-39b592b8569b'),
UUID('de23cac9-1996-4af5-8554-4f6d017b3459'),
UUID('96af3842-d5fb-4f13-aea0-726efaeba6ea'),
UUID('08b31565-f990-4d15-a4a7-89b3ba6b1340'),
UUID('5a6a2711-c574-42f0-9940-ea1fd0cc2f09'),
UUID('29277dd4-dc78-40c0-806c-f88a4f902436'),
UUID('6ed1b36d-08a9-403d-b247-e426228c0492'),
UUID('d2b923b9-68b9-4e88-9b8c-29416694efb1'),
UUID('e82b1c5a-7c07-46b7-afd7-b53ac9db1f42'),
UUID('dcdb5f74-071a-4e5b-a954-e613c5b46e5d'),
UUID('ea59084e-11d4-4076-9252-3b9a29ce35e0'),
UUID('cd6482fe-56a2-4bf8-b8a8-d74f6e3c22c8'),
UUID('3ca88f7c-fb1a-467e-9e29-99909d92c904')]
In [1078]: edb.get_timeseries_db().find({"data.ts": {"$lt": ts_2000.timestamp}}).distinct("metadata.key")
Out[1078]:
[u'stats/client_time',
u'background/battery',
u'stats/client_nav_event',
u'statemachine/transition',
u'background/location',
u'background/filtered_location']
In [1080]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["statemachine/transition", "background/location", "background/filtered_location"]}, "data.ts": {"$lt": ts_2000.timestamp}}).count()
Out[1080]: 744
In [1082]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["statemachine/transition", "background/location", "background/filtered_location"]}, "data.ts": {"$lt": ts_2000.timestamp}}).distinct("metadata.platform")
Out[1082]: [u'android']
Actually, no. There are a lot more, including a lot of non-stat entries, and
they are all on android. Going to verify some of this manually and then move on.
So they are an interesting mix of some kind of weird timestamp that is broken
for both ts
and write_ts
and entries where the location apparently had ts = 0
.
>>> list(edb.get_timeseries_db().find({"metadata.key": {"$in": ["statemachine/transition", "background/location", "background/filtered_location"]}, "data.ts": {"$lt": ts_2000.timestamp}}, {"user_id":1, "data.ts": 1, "data.fmt_time": 1, "metadata.write_ts": 1, "metadata.write_fmt_time": 1}).limit(10))
[{u'_id': ObjectId('56c495afeaedff78c762a711'),
u'data': {u'fmt_time': u'1970-03-12T00:27:13+08:00', u'ts': 6020833},
u'metadata': {u'write_fmt_time': u'1970-03-12T00:27:13+08:00',
u'write_ts': 6020833},
u'user_id': UUID('99b29da2-989d-4e54-a211-284bde1d362d')},
{u'_id': ObjectId('56c495afeaedff78c762a710'),
u'data': {u'fmt_time': u'1970-03-11T14:25:57+08:00', u'ts': 5984757},
u'metadata': {u'write_fmt_time': u'1970-03-11T14:25:57+08:00',
u'write_ts': 5984757},
u'user_id': UUID('99b29da2-989d-4e54-a211-284bde1d362d')},
{u'_id': ObjectId('575a6423383999ecb7a5e183'),
u'data': {u'fmt_time': u'1969-12-31T16:00:00-08:00', u'ts': 0},
u'metadata': {u'write_fmt_time': u'2016-06-09T23:43:11.391000-07:00',
u'write_ts': 1465540991.391},
u'user_id': UUID('693a42f1-6f00-497e-8b40-a8339fd5af8d')},
{u'_id': ObjectId('5760c4dc383999ecb7a98155'),
u'data': {u'fmt_time': u'1969-12-31T16:00:00-08:00', u'ts': 0},
u'metadata': {u'write_fmt_time': u'2016-06-14T16:23:03.521000-07:00',
u'write_ts': 1465946583.521},
At some point, I should go through and throw out all this data. But it is a small amount of data and can wait.
And to complete the exploration, all the broken stats are from the same user.
In [1089]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["stats/client_time", "stats/client_nav_event", "stats/client_error", "background/battery"]}, "data.ts": {"$lt": ts_2000.timestamp}}).distinct("user_id")
Out[1089]: [UUID('077bbe67-7693-3a09-9b3c-c8ba935e46ee')]
Next, remove all test phone data, including pipeline states. Do NOT remove USP open data because there is data outside the open range too, at least for one user (me!)
In [1053]: edb.get_uuid_db().find().count()
Out[1053]: 516
In [1054]: edb.get_timeseries_db().find().count()
Out[1054]: 54818879
In [1055]: edb.get_analysis_timeseries_db().find().count()
Out[1055]: 16573823
In [1056]: len(edb.get_pipeline_state_db().find().distinct('user_id'))
Out[1056]: 487
$ ./e-mission-py.bash bin/debug/purge_multi_timeline_for_range.py --pipeline-purge /tmp/public_data/dump_
INFO:root:Loading file or prefix /tmp/public_data/dump_
INFO:root:Found 12 matching files for prefix /tmp/public_data/dump_
INFO:root:files are ['/tmp/public_data/dump_fd7b4c2e-2c8b-3bfa-94f0-d1e3ecbd5fb7.gz', '/tmp/public_data/dump_6561431f-d4c1-4e0f-9489-ab1190341fb7.gz', '/tmp/public_data/dump_079e0f1a-c440-3d7c-b0e7-de160f748e35.gz', '/tmp/public_data/dump_92cf5840-af59-400c-ab72-bab3dcdf7818.gz', '/tmp/public_data/dump_3bc0f91f-7660-34a2-b005-5c399598a369.gz'] ... ['/tmp/public_data/dump_95e70727-a04e-3e33-b7fe-34ab19194f8b.gz', '/tmp/public_data/dump_c528bcd2-a88b-3e82-be62-ef4f2396967a.gz', '/tmp/public_data/dump_70968068-dba5-406c-8e26-09b548da0e4b.gz', '/tmp/public_data/dump_93e8a1cc-321f-4fa9-8c3c-46928668e45d.gz']
INFO:root:==================================================
INFO:root:Deleting data from file /tmp/public_data/dump_fd7b4c2e-2c8b-3bfa-94f0-d1e3ecbd5fb7.gz
...
INFO:root:For uuid = e471711e-bd14-3dbe-80b6-9c7d92ecc296, deleting entries from the timeseries
INFO:root:result = {u'ok': 1, u'n': 3923919}
INFO:root:For uuid = e471711e-bd14-3dbe-80b6-9c7d92ecc296, deleting entries from the analysis_timeseries
INFO:root:result = {u'ok': 1, u'n': 39361}
INFO:root:For uuid e471711e-bd14-3dbe-80b6-9c7d92ecc296, deleting entries from the user_db
INFO:root:result = {u'ok': 1, u'n': 1}
INFO:root:For uuid e471711e-bd14-3dbe-80b6-9c7d92ecc296, deleting entries from the pipeline_state_db
INFO:root:result = {u'ok': 1, u'n': 12}
In [1057]: edb.get_uuid_db().find().count()
Out[1057]: 504
In [1058]: edb.get_timeseries_db().find().count()
Out[1058]: 39797029
In [1059]: edb.get_analysis_timeseries_db().find().count()
Out[1059]: 16190498
In [1060]: len(edb.get_pipeline_state_db().find().distinct('user_id'))
Out[1060]: 475
Next, remove all config documents from the usercache
In [1061]: edb.get_usercache_db().find({"metadata.type": "document", "metadata.key": {"$in": ['config/consent', 'config/sensor_config', 'config/sync_config']}}).count()
Out[1061]: 239
In [1062]: edb.get_usercache_db().remove({"metadata.type": "document", "metadata.key": {"$in": ['config/consent', 'config/sensor_config', 'config/sync_config']}})
Out[1062]: {u'n': 239, u'ok': 1}
Next, re-remove unused collections. This time, since we have migrated all stats, we can remove those databases as well.
In [1090]: edb.get_alternatives_db().remove()
Out[1090]: {u'n': 101146, u'ok': 1}
In [1091]: edb.get_client_db().remove()
Out[1091]: {u'n': 3, u'ok': 1}
In [1092]: edb.get_common_place_db().remove()
Out[1092]: {u'n': 0, u'ok': 1}
In [1093]: edb.get_common_trip_db().remove()
Out[1093]: {u'n': 0, u'ok': 1}
In [1094]: edb.get_pending_signup_db().remove()
Out[1094]: {u'n': 25, u'ok': 1}
In [1095]: edb._get_current_db().Stage_place.remove()
Out[1095]: {u'n': 0, u'ok': 1}
In [1096]: edb.get_routeCluster_db().remove()
Out[1096]: {u'n': 90, u'ok': 1}
In [1097]: edb._get_current_db().Stage_routeDistanceMatrix.remove()
Out[1097]: {u'n': 7, u'ok': 1}
In [1098]: edb._get_current_db().Stage_section_new.remove()
Out[1098]: {u'n': 0, u'ok': 1}
In [1099]: edb._get_current_db().Stage_stop.remove()
Out[1099]: {u'n': 0, u'ok': 1}
In [1100]: edb._get_current_db().Stage_trip_new.remove()
Out[1100]: {u'n': 0, u'ok': 1}
In [1101]: edb._get_current_db().Stage_user_moves_access.remove()
Out[1101]: {u'n': 118, u'ok': 1}
In [1102]: edb._get_current_db().Stage_utility_models.remove()
Out[1102]: {u'n': 36, u'ok': 1}
In [1103]: edb._get_current_db().Stage_Worktime.remove()
Out[1103]: {u'n': 2662, u'ok': 1}
In [1104]: edb.get_client_stats_db_backup().remove()
Out[1104]: {u'n': 650961, u'ok': 1}
In [1105]: edb.get_server_stats_db_backup().remove()
Out[1105]: {u'n': 449523, u'ok': 1}
Ok! I think we are done! There's plenty of room on the transfer disk, so let's just create a new dump and keep the old dump as backup.
/dev/xvdg 296G 53G 228G 19% /transfer
$ mongodump --out /transfer/cleanedup-jan-3
2018-01-03T18:44:29.016+0000 Test_database.Test_Set to /transfer/cleanedup-jan-3/Test_database/Test_Set.bson
2018-01-03T18:44:29.018+0000 1 documents
2018-01-03T18:44:29.019+0000 Metadata for Test_database.Test_Set to /transfer/cleanedup-jan-3/Test_database/Test_Set.metadata.json
2018-01-03T18:44:29.019+0000 DATABASE: admin to /transfer/cleanedup-jan-3/admin
Dump is done!
Remaining steps:
Unmount from current stack
$ sudo umount /transfer
Snapshot
Create new volume in the correct region
Attach new volume to the database from the new stack
Mount the new volume
$ sudo mkdir -p /transfer
$ sudo chown ec2-user:ec2-user /transfer/
$ sudo mount /dev/xvdi /transfer/
$ ls /transfer
cleanedup-jan-3 lost+found odc-usp-2017 original-jan-1 public_phone_stats
Restore
Validate
On old server
In [521]: edb.get_uuid_db().find().count()
Out[521]: 504
In [522]: edb.get_timeseries_db().find().count()
Out[522]: 39797029
In [523]: edb.get_analysis_timeseries_db().find().count()
Out[523]: 16190498
In [524]: len(edb.get_pipeline_state_db().find().distinct('user_id'))
Out[524]: 475
In [525]: edb.get_usercache_db().find().count()
Out[525]: 10011104
On new server
In [2]: edb.get_uuid_db().find().count()
Out[2]: 504
In [3]: edb.get_timeseries_db().find().count()
Out[3]: 39797135
In [4]: edb.get_analysis_timeseries_db().find().count()
Out[4]: 16190498
In [5]: len(edb.get_pipeline_state_db().find().distinct('user_id'))
Out[5]: 475
In [6]: edb.get_usercache_db().find().count()
Out[6]: 10011104
Check our favourite users
Me
In [7]: edb.get_uuid_db().find_one({"user_email": "shankari@berkeley.edu"})
Out[7]:
{'_id': ObjectId('54a6bdfd39e59673fd9fba5b'),
'update_ts': datetime.datetime(2017, 8, 20, 2, 29, 50, 275000),
'user_email': '<shankari_email>',
'uuid': UUID('<shankari_uuid>')}
In [10]: edb.get_uuid_db().find({"user_id": UUID('<shankari_uuid>')}).count()
Out[10]: 0
In [11]: edb.get_timeseries_db().find({"user_id": UUID('<shankari_uuid>')}).count()
Out[11]: 735156
In [12]: edb.get_analysis_timeseries_db().find({"user_id": UUID('<shankari_uuid>')}).count()
Out[12]: 166571
In [23]: list(edb.get_timeseries_db().find({"user_id": shankari_entry["uuid"]}).sort("data.ts", 1).limit(1))
Out[23]:
[{'_id': ObjectId('5614ee7d88f663584fa03131'),
'data': {'_id': ObjectId('5614ee7d88f663584fa03131'),
'exit_fmt_time': '2015-08-21T18:06:16.905000-07:00',
'exit_ts': 1440205576.905,
'location': {'coordinates': [-122.4426899, 37.7280596], 'type': 'Point'},
'starting_trip': ObjectId('5614ee7d88f663584fa03132'),
'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954')},
'metadata': {'key': 'segmentation/raw_place',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_fmt_time': '2016-04-25T06:32:01.332099-07:00',
'write_ts': 1461591121.332099},
'user_id': UUID('<shankari_uuid>')}]
In [24]: list(edb.get_analysis_timeseries_db().find({"user_id": shankari_entry["uuid"]}).sort("data.ts", 1).limit(1))
Out[24]:
[{'_id': ObjectId('57e962fa88f66347503059e7'),
'data': {'exit_fmt_time': '2015-07-13T15:25:56.852000-07:00',
'exit_ts': 1436826356.852,
'location': {'coordinates': [-122.0879696, 37.3885529], 'type': 'Point'},
'source': 'DwellSegmentationTimeFilter',
'starting_trip': ObjectId('57e962fa88f66347503059e8')},
'metadata': {'key': 'segmentation/raw_place',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_fmt_time': '2016-09-26T11:03:38.105793-07:00',
'write_ts': 1474913018.105793},
'user_id': UUID('<shankari_uuid>')}]
In [25]: list(edb.get_timeseries_db().find({"user_id": shankari_entry["uuid"]}).sort("data.ts", -1).limit(1))
Out[25]:
[{'_id': ObjectId('581becd188f6630386d15ac5'),
'data': {'battery_level_pct': 98.0, 'ts': 1478225037729.0},
'metadata': {'key': 'background/battery',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_ts': 1478225037729.0},
'user_id': UUID('<shankari_uuid>')}]
In [26]: list(edb.get_analysis_timeseries_db().find({"user_id": shankari_entry["uuid"]}).sort("data.ts", -1).limit(1))
Out[26]:
[{'_id': ObjectId('59eed4b188f66334694bfcb2'),
'data': {'altitude': 1.0,
'distance': 2.8965257248272045,
'fmt_time': '2017-10-23T17:05:41-07:00',
'heading': 31.494464698609224,
'idx': 27,
'latitude': 37.3909994,
'loc': {'coordinates': [-122.0864596, 37.3909994], 'type': 'Point'},
'longitude': -122.0864596,
'mode': 0,
'section': ObjectId('59eed4b188f66334694bfc96'),
'speed': 0.12344985580418566,
'ts': 1508803541.0},
'metadata': {'key': 'analysis/recreated_location',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_fmt_time': '2017-10-23T22:50:41.916285-07:00',
'write_ts': 1508824241.916285},
'user_id': UUID('<shankari_uuid>')}]
Tom
```
In [16]: tom_entry = edb.get_uuid_db().find_one({"user_email": "tamtom2000@gmail.com"})
In [17]: tom_entry
Out[17]:
{'_id': ObjectId('543c8a2239e59673fd9fb9dc'),
'update_ts': datetime.datetime(2017, 5, 6, 21, 29, 3, 780000),
'user_email': '<tom_email>',
'uuid': UUID('<tom_uuid>')}
In [18]: edb.get_timeseries_db().find({"user_id": tom_entry["uuid"]}).count()
Out[18]: 592479
In [19]: edb.get_analysis_timeseries_db().find({"user_id": tom_entry["uuid"]}).count()
Out[19]: 139926
In [27]: list(edb.get_timeseries_db().find({"user_id": tom_entry["uuid"]}).sort("data.ts", 1).limit(1))
Out[27]:
[{'_id': ObjectId('564ed10488f66311474836bd'),
'data': {'_id': ObjectId('564ed10488f66311474836bd'),
'exit_fmt_time': '2015-07-21T00:56:30.414000-07:00',
'exit_ts': 1437465390.414,
'location': {'coordinates': [-122.0862835, 37.3909556], 'type': 'Point'},
'starting_trip': ObjectId('564ed10488f66311474836be'),
'user_id': UUID('b0d937d0-70ef-305e-9563-440369012b39')},
'metadata': {'key': 'segmentation/raw_place',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_fmt_time': '2016-04-25T06:32:01.391767-07:00',
'write_ts': 1461591121.391767},
'user_id': UUID('<tom_uuid>')}]
In [41]: list(edb.get_analysis_timeseries_db().find({"user_id": tom_entry["uuid"]}).sort("data.ts", 1).limit(1))
Out[41]:
[{'_id': ObjectId('57e9906b88f66347503240e7'),
'data': {'exit_fmt_time': '2015-07-21T00:56:30.414000-07:00',
'exit_ts': 1437465390.414,
'location': {'coordinates': [-122.0862835, 37.3909556], 'type': 'Point'},
'source': 'DwellSegmentationTimeFilter',
'starting_trip': ObjectId('57e9906b88f66347503240e8')},
'metadata': {'key': 'segmentation/raw_place',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_fmt_time': '2016-09-26T14:17:31.475597-07:00',
'write_ts': 1474924651.475597},
'user_id': UUID('<tom_uuid>')}]
In [29]: list(edb.get_timeseries_db().find({"user_id": tom_entry["uuid"]}).sort("data.ts", -1).limit(1))
Out[29]:
[{'_id': ObjectId('581b4c5588f6630386d0ea56'),
'data': {'battery_level_pct': 85.0, 'ts': 1478184016437.0},
'metadata': {'key': 'background/battery',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_ts': 1478184016437.0},
'user_id': UUID('<tom_uuid>')}]
In [30]: list(edb.get_analysis_timeseries_db().find({"user_id": tom_entry["uuid"]}).sort("data.ts", -1).limit(1))
Out[30]:
[{'_id': ObjectId('59a23ae188f6632233d07f2f'),
'data': {'altitude': 0.0,
'distance': 1.0747850121465141,
'fmt_time': '2017-08-25T16:21:37.192000-07:00',
'heading': 134.91688221066724,
'idx': 81,
'latitude': 37.3910415,
'loc': {'coordinates': [-122.0864408, 37.3910415], 'type': 'Point'},
'longitude': -122.0864408,
'mode': 4,
'section': ObjectId('59a23ae188f6632233d07edd'),
'speed': 0.10598412720427686,
'ts': 1503703297.192},
'metadata': {'key': 'analysis/recreated_location',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_fmt_time': '2017-08-26T20:22:09.988126-07:00',
'write_ts': 1503804129.988126},
'user_id': UUID('<tom_uuid>')}]
```
I can see a couple of things that we should clean up. One which is easy and the other which should be done later.
First, we need to adjust the timestamps on the stats objects again. We fixed all the client entries but not the battery entries.
In [33]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["background/battery"]},"data.ts": {"$gt": now.timestamp}}).count()
Out[33]: 625387
for e in edb.get_timeseries_db().find({"metadata.key": {"$in": ["background/battery"]}, "data.ts": {"$gt": now.timestamp}}):
edb.get_timeseries_db().update({"_id": e["_id"]},
{"$set": {"data.ts": float(e["data"]["ts"])/1000,
"metadata.write_ts": float(e["metadata"]["write_ts"])/1000}})
In [35]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["background/battery"]},"data.ts": {"$gt": now.timestamp}}).count()
Out[35]: 0
Now the most recent entries in the timeseries should be fixed.
In [36]: list(edb.get_timeseries_db().find({"user_id": shankari_entry["uuid"]}).sort("data.ts", -1).limit(1))
Out[36]:
[{'_id': ObjectId('5a4aaf3088f6636e03141d7a'),
'data': {'name': 'POST_/usercache/get',
'reading': 0.45238590240478516,
'ts': 1514843952.94657},
'metadata': {'key': 'stats/server_api_time',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_fmt_time': '2018-01-01T13:59:12.947125-08:00',
'write_ts': 1514843952.947125},
'user_id': UUID('ea59084e-11d4-4076-9252-3b9a29ce35e0')}]
In [37]: list(edb.get_timeseries_db().find({"user_id": tom_entry["uuid"]}).sort("data.ts", -1).limit(1))
Out[37]:
[{'_id': ObjectId('5a499a4688f6636e031416a9'),
'data': {'name': 'POST_/datastreams/find_entries/timestamp',
'reading': 0.06373810768127441,
'ts': 1514773062.554465},
'metadata': {'key': 'stats/server_api_time',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_fmt_time': '2017-12-31T18:17:42.554857-08:00',
'write_ts': 1514773062.554857},
'user_id': UUID('<tom_uuid>')}]
Second, we need to move all the segmentation/raw
entries from the timeseries
to the analysis_timeseries. They are not doing anything bad there - we assume
trips and sections are only in the analysis database, and read them only from
there.
"segmentation/raw_trip": self.analysis_timeseries_db,
However, it seems like a bad idea to have weird data sitting around. Is it actually duplicated in the analysis database? If so, can we just delete them from the timeseries_db?
Are they duplicated? For Tom, yes. for me, pretty close.
For me: Times are different, although ~ 1 month from each other. Locations are different.
[{'_id': ObjectId('5614ee7d88f663584fa03131'),
'data': {'_id': ObjectId('5614ee7d88f663584fa03131'),
'exit_fmt_time': '2015-08-21T18:06:16.905000-07:00',
'location': {'coordinates': [-122.4426899, 37.7280596], 'type': 'Point'},
[{'_id': ObjectId('57e962fa88f66347503059e7'),
'data': {'exit_fmt_time': '2015-07-13T15:25:56.852000-07:00',
'location': {'coordinates': [-122.0879696, 37.3885529], 'type': 'Point'},
For Tom: only _id
s are different. Eveything else is identical.
[{'_id': ObjectId('564ed10488f66311474836bd'),
'data': {'_id': ObjectId('564ed10488f66311474836bd'),
'exit_fmt_time': '2015-07-21T00:56:30.414000-07:00',
'location': {'coordinates': [-122.0862835, 37.3909556], 'type': 'Point'},
[{'_id': ObjectId('57e9906b88f66347503240e7'),
'data': {'exit_fmt_time': '2015-07-21T00:56:30.414000-07:00',
'location': {'coordinates': [-122.0862835, 37.3909556], 'type': 'Point'},
How many of them are there?
In [50]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["segmentation/raw_place", "segmentation/raw_place", "segmentation/raw_section", "segmentation/raw_stop", "segmentation/raw_untracked"]}}).count()
Out[50]: 29023
Is there really overlap?
First timeseries entry
In [66]: list(edb.get_timeseries_db().find({"metadata.key": {"$in": ["segmentation/raw_place", "segmentation/raw_place", "segmentation/raw_section", "segmentation/raw_stop", "segmentation/raw_untracked"]}}).sort([("data.enter_ts", 1), ("data.exit_ts", 1), ("data.start_ts", 1), ("data.end_ts", 1)]).limit(1))
Out[66]:
[{'_id': ObjectId('5674839788f66340b2fb12b9'),
'data': {'_id': ObjectId('5674839788f66340b2fb12b9'),
'duration': 269.63700008392334,
'end_fmt_time': '2015-07-13T15:30:26.489000-07:00',
'end_loc': {'coordinates': [-122.0824345, 37.3790636], 'type': 'Point'},
'end_stop': ObjectId('5674839788f66340b2fb12bb'),
'end_ts': 1436826626.489,
'sensed_mode': 0,
'source': 'SmoothedHighConfidenceMotion',
'start_fmt_time': '2015-07-13T15:25:56.852000-07:00',
'start_loc': {'coordinates': [-122.0879696, 37.3885529], 'type': 'Point'},
'start_ts': 1436826356.852,
'trip_id': ObjectId('5674838188f66340b2fb0c9c'),
'user_id': UUID('0763de67-f61e-3f5d-90e7-518e69793954')},
'metadata': {'key': 'segmentation/raw_section',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_fmt_time': '2016-04-25T06:34:22.352027-07:00',
'write_ts': 1461591262.352027},
'user_id': UUID('ea59084e-11d4-4076-9252-3b9a29ce35e0')}]
Last timeseries entry
In [65]: list(edb.get_timeseries_db().find({"metadata.key": {"$in": ["segmentation/raw_place", "segmentation/raw_place", "segmentation/raw_section", "segmentation/raw_stop", "segmentation/raw_untracked"]}}).sort([("data.enter_ts", -1), ("data.exit_ts", -1), ("data.start_ts", -1), ("data.end_ts", -1)]).limit(1))
Out[65]:
[{'_id': ObjectId('571dfacb88f66333657dafb5'),
'data': {'_id': ObjectId('571dfacb88f66333657dafb5'),
'ending_trip': ObjectId('571dfacb88f66333657dafb4'),
'enter_fmt_time': '2016-04-25T01:54:55.062190-07:00',
'enter_ts': 1461574495.06219,
'location': {'coordinates': [-122.2528321669644, 37.86827700681786],
'type': 'Point'},
'source': 'DwellSegmentationDistFilter',
'user_id': UUID('788f46af-9e6d-300b-93e1-981ba9b3390b')},
'metadata': {'key': 'segmentation/raw_place',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_fmt_time': '2016-04-25T06:32:12.557781-07:00',
'write_ts': 1461591132.557781},
'user_id': UUID('43f9361e-1cb6-4026-99ba-458be357d245')}]
First analysis entry
In [63]: list(edb.get_analysis_timeseries_db().find({"metadata.key": {"$in": ["segmentation/raw_place", "segmentation/raw_place", "segmentation/raw_section", "segmentation/raw_stop", "segmentation/raw_untracked"]}}).sort([("data.enter_ts", 1), ("data.exit_ts", 1), ("data.start_ts", 1), ("data.end_ts", 1)]).limit(1))
Out[63]:
[{'_id': ObjectId('57e9648a88f6634750306dc5'),
'data': {'duration': 269.63700008392334,
'end_fmt_time': '2015-07-13T15:30:26.489000-07:00',
'end_loc': {'coordinates': [-122.0824345, 37.3790636], 'type': 'Point'},
'end_stop': ObjectId('57e9648a88f6634750306dc7'),
'end_ts': 1436826626.489,
'sensed_mode': 0,
'source': 'SmoothedHighConfidenceMotion',
'start_fmt_time': '2015-07-13T15:25:56.852000-07:00',
'start_loc': {'coordinates': [-122.0879696, 37.3885529], 'type': 'Point'},
'start_ts': 1436826356.852,
'trip_id': ObjectId('57e962fa88f66347503059e8')},
'metadata': {'key': 'segmentation/raw_section',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_fmt_time': '2016-09-26T11:10:18.444634-07:00',
'write_ts': 1474913418.444634},
'user_id': UUID('ea59084e-11d4-4076-9252-3b9a29ce35e0')}]
Last analysis entry
In [64]: list(edb.get_analysis_timeseries_db().find({"metadata.key": {"$in": ["segmentation/raw_place", "segmentation/raw_place", "segmentation/raw_section", "segmentation/raw_stop", "segmentation/raw_untracked"]}}).sort([("data.enter_ts", -1), ("data.exit_ts", -1), ("data.start_ts", -1), ("data.end_ts", -1)]).limit(1))
Out[64]:
[{'_id': ObjectId('59db353088f6636450bb268e'),
'data': {'duration': -1490206.003000021,
'ending_trip': ObjectId('59db353088f6636450bb268d'),
'enter_fmt_time': '2017-10-25T22:36:49.223000-07:00',
'enter_ts': 1508996209.223,
'exit_fmt_time': '2017-10-08T16:40:03.220000-07:00',
'exit_ts': 1507506003.22,
'location': {'coordinates': [-122.2579113, 37.873973], 'type': 'Point'},
'source': 'DwellSegmentationTimeFilter',
'starting_trip': ObjectId('59db353088f6636450bb268f')},
'metadata': {'key': 'segmentation/raw_place',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_fmt_time': '2017-10-09T01:37:04.036013-07:00',
'write_ts': 1507538224.036013},
'user_id': UUID('06f82876-4090-482f-a7be-91345df47bb2')}]
So it looks like we ran the pipeline in april, back when we were still storing
entries to the timeseries. Then we split and re-ran the pipeline but did not
delete the old entries. So the timeseries entries are from
2015-07-13T15:25:56.852000-07:00
to 2016-04-25T01:54:55.062190-07:00
,
while the analysis timeseries entries are from
2015-07-13T15:25:56.852000-07:00
-> 2017-10-08T16:40:03.220000-07:00
.
So there is a clear overlap and we can delete the entries from the timeseries.
Delete now or delete later?
Let's just delete now, while we still have backups sitting around.
In [67]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["segmentation/raw_place", "segmentation/raw_place", "segmentation/raw_section", "segmentation/raw_stop", "segmentation/raw_untracked"]}}).count()
Out[67]: 29023
In [68]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["analysis/cleaned_place", "analysis/cleaned_trip", "analysis/cleaned_section", "analysis/cleaned_stop", "analysis/cleaned_untracked"]}}).count()
Out[68]: 0
In [69]: edb.get_analysis_timeseries_db().find({"metadata.key": {"$in": ["analysis/cleaned_place", "analysis/cleaned_trip", "analysis/cleaned_section", "analysis/cleaned_stop", "analysis/cleaned_untracked"]}}).count()
Out[69]: 505455
It looks like the first run was pre-cleaned trips, so we only have to delete raw-*
In [76]: edb.get_timeseries_db().delete_many({"metadata.key": {"$in": ["segmentation/raw_place", "segmentation/raw_place", "segmentation/raw_section", "segmentation/raw_stop", "segmentation/raw_untracked"]}}).raw_result
Out[76]: {'n': 29023, 'ok': 1.0}
In [77]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["segmentation/raw_place", "segmentation/raw_place", "segmentation/raw_section", "segmentation/raw_stop", "segmentation/raw_untracked"]}}).count()
Out[77]: 0
Ok, so now the oldest entries in the timeseries should be different.
In [78]: list(edb.get_timeseries_db().find({"user_id": shankari_entry["uuid"]}).sort("data.ts", 1).limit(1))
Out[78]:
[{'_id': ObjectId('564f7d6388f66343e476e832'),
'data': {'deleted_points': [],
'filtering_algo': 'SmoothZigzag',
'outlier_algo': 'BoxplotOutlier',
'section': ObjectId('564f7d3888f66343e476e518')},
'metadata': {'key': 'analysis/smoothing',
'platform': 'server',
'time_zone': 'America/Los_Angeles',
'write_fmt_time': '2015-11-20T12:06:59.574516-08:00',
'write_local_dt': datetime.datetime(2015, 11, 20, 20, 6, 59, 574000),
'write_ts': 1448050019.574516},
'user_id': UUID('ea59084e-11d4-4076-9252-3b9a29ce35e0')}]
Oops. It is now another generated result. Let's query and delete these as well.
In [79]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["analysis/smoothing"]}}).count()
Out[79]: 9033
In [82]: edb.get_timeseries_db().delete_many({"metadata.key": {"$in": ["analysis/smoothing"]}}).raw_result
Out[82]: {'n': 9033, 'ok': 1.0}
In [83]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["analysis/smoothing"]}}).count()
Out[83]: 0
Ok, so now the oldest entries in the timeseries should be different. Argh, we missed segmentation/raw_trip
in the first query.
In [85]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["segmentation/raw_trip"]}}).count()
Out[85]: 7815
In [86]: edb.get_timeseries_db().delete_many({"metadata.key": {"$in": ["segmentation/raw_trip"]}}).raw_result
Out[86]: {'n': 7815, 'ok': 1.0}
In [87]: edb.get_timeseries_db().find({"metadata.key": {"$in": ["segmentation/raw_trip"]}}).count()
Out[87]: 0
Ok, so now do we have correct oldest entries in the timeseries.
Yes, finally. Although it doesn't really have a ts
.
In [88]: list(edb.get_timeseries_db().find({"user_id": shankari_entry["uuid"]}).sort("data.ts", 1).limit(1))
Out[88]:
[{'_id': ObjectId('579fb85988f66357dde496ae'),
'data': {'approval_date': '2016-07-14',
'category': 'emSensorDataCollectionProtocol',
'protocol_id': '2014-04-6267'},
'metadata': {'key': 'config/consent',
'platform': 'android',
'read_ts': 0,
'time_zone': 'Pacific/Honolulu',
'type': 'rw-document',
'write_fmt_time': '2016-08-01T06:02:09.845000-10:00',
'write_ts': 1470067329.845},
'user_id': UUID('ea59084e-11d4-4076-9252-3b9a29ce35e0')}]
In [89]: list(edb.get_timeseries_db().find({"user_id": shankari_entry["uuid"], "data.ts": {"$exists": True}}).sort("data.ts", 1).limit(1))
Out[89]:
[{'_id': ObjectId('59822c3fcb17471ac0667b86'),
'data': {'accuracy': 1086.116,
'altitude': 0,
'elapsedRealtimeNanos': 7479231065135,
'filter': 'time',
'fmt_time': '1969-12-31T16:00:00-08:00',
'heading': 0,
'latitude': -23.56214940547943,
'loc': {'coordinates': [-46.72179579734802, -23.56214940547943],
'type': 'Point'},
'longitude': -46.72179579734802,
'sensed_speed': 0,
'ts': 0},
'metadata': {'key': 'background/location',
'platform': 'android',
'read_ts': 0,
'time_zone': 'America/Los_Angeles',
'type': 'sensor-data',
'write_fmt_time': '2017-08-02T12:01:48.909000-07:00',
'write_ts': 1501700508.909},
'user_id': UUID('ea59084e-11d4-4076-9252-3b9a29ce35e0')}]
ok, so now that we believe that that database is fine, we can run the pipeline again for the first time in forever.
$ sudo umount /transfer
$
Note that we now have entries in the timeseries that are client stats only and have no uuid entry or any real data. These are zombie entries from before the uuid change. Are we ignoring these correctly?
A quick check shows that we just read from the UUID database.
def get_all_uuids():
all_uuids = [e["uuid"] for e in edb.get_uuid_db().find()]
return all_uuids
There are other methods that still use distinct
, notably aggregate_timeseries.get_distinct_users
and builtin_timeseries.get_uuid_list
, but they don't seem to be used anywhere in the code. Let's remove them as part of cleanup so people are not tempted to use them. Alternatively, we can move zombie entries into a separate timeseries where they won't pollute anything.
Ok, so let's run this script!
$ ./e-mission-ipy.bash bin/intake_multiprocess.py 4 > /log/intake.stdinout.log 2>&1
$ date
Thu Jan 4 08:17:24 UTC 2018
It's taking a long time to even just get started. I wonder if we are using distinct
somewhere...
Looking at the launcher logs, it is still iterating through the users and querying for the number of entries in the usercache. I don't even think we use that functionality and can probably get rid of it in the next release.
Ah so how processes have been launched. Hopefully this first run will be done by tomorrow morning.
We should also remove the filter_accuracy
step from the real server since it is only applicable for test phones/open data.
2018-01-04T12:02:11.764406+00:00**********UUID <shankari_uuid>: moving to long term**********
2018-01-04T12:04:29.687814+00:00**********UUID <shankari_uuid>: filter accuracy if needed**********
2018-01-04T12:26:01.563703+00:00**********UUID <shankari_uuid>: segmenting into trips**********
2018-01-04T12:33:06.101569+00:00**********UUID <shankari_uuid>: segmenting into sections**********
2018-01-04T12:33:33.701075+00:00**********UUID <shankari_uuid>: smoothing sections**********
2018-01-04T12:33:54.020444+00:00**********UUID <shankari_uuid>: cleaning and resampling timeline**********
2018-01-04T12:40:12.439267+00:00**********UUID <shankari_uuid>: checking active mode trips to autocheck habits**********
I'm also seeing some errors with saving data, need to make a pass through the errors.
Got error None while saving entry AttrDict({'_id': ObjectId('59f11f15cb17471ac0cfc059'), 'metadata': {'write_ts': 1508533079.587702, 'plugin': 'none', 'time_zone': 'America/Montreal', 'platform': 'ios', 'key': 'statemachine/transition', 'read_ts': 0, 'type': 'message'}, 'user_id': UUID('e95bfd0b-1cfc-4ea3-af01-c41dc2fad0ed'), 'data': {'transition': None, 'ts': 1508533079.587526, 'currState': 'STATE_ONGOING_TRIP'}}) -> None
AMPLab is using too many resources, so I have to trim my consumption. Let's finally finish cleaning up the old servers
Copied dump_team_trajectories.py
off the old server.
There shouldn't be anything else.
One caveat is that the original data is 150 GB
143G /mnt/e-mission-primary-db/mongodb
But the new dataset is only 20GB.
/dev/mapper/xvdf 3.0T 20G 3.0T 1% /data
/dev/mapper/xvdg 25G 633M 25G 3% /journal
/dev/mapper/xvdh 10G 638M 9.4G 7% /log
So what is missing?
It doesn't appear to be the data. These are the only differences in collections between the old and new databases. There is nothing missing except for system.indexes
, which should be an auto-generated collection.
Yup! system.indexes
is now deprecated.
https://docs.mongodb.com/manual/reference/system-collections/#%3Cdatabase%3E.system.indexes
Deprecated since version 3.0: Access this data using listIndexes.
And listIndexes
does have the data.
> db.Stage_timeseries.getIndexes()
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "Stage_database.Stage_timeseries"
},
{
"v" : 2,
"key" : {
"user_id" : "hashed"
},
"name" : "user_id_hashed",
"ns" : "Stage_database.Stage_timeseries"
},
{
"v" : 2,
"key" : {
"metadata.key" : "hashed"
},
"name" : "metadata.key_hashed",
"ns" : "Stage_database.Stage_timeseries"
},
{
"v" : 2,
"key" : {
"metadata.write_ts" : -1
},
"name" : "metadata.write_ts_-1",
"ns" : "Stage_database.Stage_timeseries"
},
{
"v" : 2,
"key" : {
"data.ts" : -1
},
"name" : "data.ts_-1",
"ns" : "Stage_database.Stage_timeseries",
"sparse" : true
},
...
Shutting down the old server now. RIP! You were a faithful friend and will be missed.
In the past 4 days, the compute has increased by $50. The storage has increased by $200. We need to turn off some storage. Wah! Wah! What if I lose something important?! I guess you just have to deal with it...
Deleted all related storage.
Even with all the deletions, we spent ~ $50/day. This is a problem, because we will then end up spending an additional $1040 for the rest of the month, and we have already spent ~ $1500. This also means that we won't be under $1000 for next month.
Since our reserved instances already cost $507, the m3/m4 legacy servers cost ~ $639, we have to keep our storage budget to under $500 to stay at my preferred 50% of my available budget.
The storage is mostly going towards the provisioned iOPS storage. I don't think I actually need 3GB.
Current storage is
Filesystem Size Used Avail Use% Mounted on
devtmpfs 30G 84K 30G 1% /dev
tmpfs 30G 0 30G 0% /dev/shm
/dev/xvda1 7.8G 2.3G 5.5G 30% /
/dev/mapper/xvdf 3.0T 21G 3.0T 1% /data
/dev/mapper/xvdg 25G 633M 25G 3% /journal
/dev/mapper/xvdh 10G 638M 9.4G 7% /log
We should be able to drop to:
/data
: 200GIf we still need to reduce after that, we can change to:
/journal
: 10G/log
: 5GLet's see how easy it is to resize EBS volumes
Let's see how easy it is to resize EBS volumes
Not too hard, you just have to copy the data around. https://matt.berther.io/2015/02/03/how-to-resize-aws-ec2-ebs-volumes/ Let's turn off everything and start copying the data tomorrow morning
This was a bit trickier than one would expect because xfs
does not support resize2fs
and in fact, does not support shrinking the file system at all. So we had to follow the instructions to use the temp instance, as in the link above.
Note that our data is also on an encrypted filesystem, so our steps were:
/
at the end of the mounted filenamechmod
new_data to the same ids as old_dataAt this point, the only diff between them is
--- /tmp/old_data_list 2018-07-11 10:07:36.038063450 +0000
+++ /tmp/new_data_list 2018-07-11 10:07:24.266143934 +0000
@@ -140,7 +140,7 @@
-rw-r--r-- 1 498 497 4.0K Jul 11 09:23 index-94-2297001609533616747.wt
-rw-r--r-- 1 498 497 4.0K Jul 11 09:23 index-96-2297001609533616747.wt
-rw-r--r-- 1 498 497 4.0K Jul 11 09:23 index-98-2297001609533616747.wt
-lrwxrwxrwx 1 root root 8 Jan 1 2018 journal -> /journal
+lrwxrwxrwx 1 498 497 8 Jul 11 09:53 journal -> /journal
-rw-r--r-- 1 498 497 48K Jul 11 09:23 _mdb_catalog.wt
-rw-r--r-- 1 498 497 0 Jul 11 09:23 mongod.lock
-rw-r--r-- 1 498 497 36K Jul 11 09:23 sizeStorer.wt
which makes sense
So now it's time to reverse the steps and attach new_data back to the server
Reversed steps, restarted server. No errors so far. Deleting old disk and migration instance.
Done. Closing this issue for now.
Burn rate is now $33/day 1419.11 - 1385.76 = 33.35
Should go down after we turn off air quality server but 33 * 15 ~ $500 so we are on track for $2000 for the month, not $1500 as originally planned
Burn rate is now roughly $13/day (1497 - 1419 = 78 over 6 days = $13/day) So this month should be $1497 + $156 = $1653 Next month should be 524 (reserved instances) + 13 * 30 = 524 + 390 = 914 (< $1000)
The server scalability had deteriorated to the point where we were unable to run the pipeline even once per day. While part of this is probably just the way we are using mongodb, part of it is also that the server resources are running out.
So I turned off the pipeline around a month ago (last run was on
2017-10-24 21:41:18
).Now, I want to re-provision with a better, split architecture, and reserved instances for lower costs.