e-mission / e-mission-docs

Repository for docs and issues. If you need help, please file an issue here. Public conversations are better for open source projects than private email.
https://e-mission.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
15 stars 34 forks source link

Handle location points with missing elapsedRealTimeNanos field #127

Closed shankari closed 5 years ago

shankari commented 9 years ago

Several of the location points reported by android (10723/16447) appear to have the elapsedRealtimeNanos field unset. The incidence of this is somewhat uneven - for the three users in our test set, the percentage of instances that are missing the field range from 93% (9372 missing v/s 686 present) to 25% (1351 missing v/s 4344 present).

Normally, this would not be a problem, because we don't use the field in our analysis now.

However, when we generate the result diary to send to the phone, we read the location points as a dataframe and then reconstruct the location entries from them. But then, if the elapsedRealtimeNanos is not present, the dataframe fills it in with nan. And nan is not a valid json value as per the standard, so it causes error while deserializing on the phone.

https://docs.python.org/2/library/json.html#infinite-and-nan-number-values

The RFC does not permit the representation of infinite or NaN number values. Despite that, by default, this module accepts and outputs Infinity, -Infinity, and NaN as if they were valid JSON number literal values:

We need to fix this.

shankari commented 9 years ago

There are several possible fixes:

  1. fill in the elapsedRealtimeNanos field with a default value
  2. drop the elapsedRealtimeNanos field from the dataframe in the result diary code
  3. stop using a dataframe to read the entries and read them as full documents instead
shankari commented 9 years ago

Another option is to fill in using fillna http://pandas.pydata.org/pandas-docs/stable/missing_data.html

The fillna function can “fill in” NA values with non-null data in a couple of ways, which we illustrate:

But then we would need to figure out what the default value should be. Can we fill it in with None?

shankari commented 9 years ago

Can we fill it in with None?

In [233]: t = pd.DataFrame([{'a': 1, 'b': 2}, {'a': 2, 'b': 3}, {'a': 4}])

In [234]: t.fillna(None)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-234-4f7131f17fa5> in <module>()
----> 1 t.fillna(None)

/Users/shankari/OSS/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in fillna(self, value, method, axis, inplace, limit, downcast)
   2337         if value is None:
   2338             if method is None:
-> 2339                 raise ValueError('must specify a fill method or value')
   2340             if self._is_mixed_type and axis == 1:
   2341                 if inplace:

ValueError: must specify a fill method or value

In [235]: t.fillna(0)
Out[235]: 
   a  b
0  1  2
1  2  3
2  4  0