CUTR-at-USF / muser-firebase-export

Exports data collected with the MUSER Android app (https://github.com/CUTR-at-USF/MUSER) from Firebase
Apache License 2.0
0 stars 3 forks source link

Timestamp in the exported file is not formatted in UTC as labeled in the csv columns #14

Closed barbeau closed 3 years ago

barbeau commented 3 years ago

Same issue as https://github.com/CUTR-at-USF/onebusaway-firebase-export/issues/31.

Given that much of the export code in this project was borrowed from the above project, the above issue likely exists in this project too. It needs to be tested and fixed.

I mentioned considering timezones in https://github.com/CUTR-at-USF/muser-firebase-export/pull/13#discussion_r573787960 as well.

m-y-u commented 3 years ago

About the timezone issue, just to confirm, we want to export it as UTC right? Although here you mention that sticking to a single timezone would work for now, but as per convention, we should convert it to UTC so that we can use it however we want at separate places, correct?

barbeau commented 3 years ago

@antcoder007 I think we need to come to an agreement with the data analysis team on what the output data should look like for ease in processing. My preference would be to output everything in UTC for consistency - we're already seeing value in that in the original project that this project was patterned after because we're collecting data across timezones, with analysis team in multiple timezones. It gets confusing if we do data conversion to a particular timezone in the export stage.

m-y-u commented 3 years ago

Yes, I prefer to move forward with UTC as well.

barbeau commented 3 years ago

FYI, here's the analysis project if you want to take a look and see what it expects as input currently: https://github.com/CUTR-at-USF/muser-data-analysis

There is also an outstanding PR there that we should get updated/merged: https://github.com/CUTR-at-USF/muser-data-analysis/pull/5

m-y-u commented 3 years ago

From what I understand after looking at this line in the muser-data-analysis repo and similar other instances where time is used is: time() in python is returning them the UNIX time since the epoch which is actually OS dependent as per what the epoch is for a particular OS.

But there is a gmtime() function as well through which we can convert time() in UTC where dst flag is zero (not applied to it). So if possible, the data-analysis team could use that to process time in UTC.

barbeau commented 3 years ago

@antcoder007 Are you sure epoch time is OS dependent in Python? I believe it's the same across all platforms, as with Java (Jan 1, 1970).

Sounds like the analysis tool is expecting the input times to be in epoch time, then?

If so we should definitely export epoch times in this project.

You can see the fix for this issue in the other original project here: https://github.com/CUTR-at-USF/onebusaway-firebase-export/pull/32

Could you apply that fix here, and add the unit tests from that PR as well? You can put it in a new PR for this project.

m-y-u commented 3 years ago

So I did some reading and it seems like as per the documentation it says that it is consistent throughout all UNIX systems (Jan 1, 1970). But it says to check it using time.gmtime(0) for a specific system. It can get a bit confusing sometimes with python but according to this: https://stackoverflow.com/questions/14186179/in-python-is-epoch-time-returned-by-time-always-measured-from-jan-1-1970 I believe yes it should be consistent and we can use it.

Also, I will apply the fix and start adding the unit tests.

barbeau commented 3 years ago

Great, thanks! If there is any question re: UTC we should test by running it on Mac and Windows machines to confirm behavior

welozano commented 3 years ago

Hello @barbeau and @antcoder007. I did take a look at the code in python, since pandas is being used, you can take a look at the Timestamp ( https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.now.html , this let you assign a tz while creating the start_date variable.) Then if required, https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.timestamp.html may be useful to obtain the timestamp in POSIX format (unix based or starting at 01-01-1970).

barbeau commented 3 years ago

Thanks @welozano!

@antcoder007 If you think changes need to be made on the analysis project based on the above could you please open an issue there? Thanks!

m-y-u commented 3 years ago

@barbeau Are we expecting a specific date format from the user as they use muser-firebase-export? Ideally, if we receive the date in "yyyy-MM-dd", we can parse it in UNIX milliseconds as follows: return new SimpleDateFormat("yyyy-MM-dd").parse(programOptions.getStartDate()).getTime(); But then there would be a need to handle parsing exceptions obviously.

m-y-u commented 3 years ago

@barbeau Are we expecting a specific date format from the user as they use muser-firebase-export? Ideally, if we receive the date in "yyyy-MM-dd", we can parse it in UNIX milliseconds as follows: return new SimpleDateFormat("yyyy-MM-dd").parse(programOptions.getStartDate()).getTime(); But then there would be a need to handle parsing exceptions obviously.

Please ignore this, I just found it in the code that we do expect "yyyy-MM-dd".

barbeau commented 3 years ago

Yes, IIRC we don't support exporting date range of records in this project (which receives dates from the user on the command line) but we do in the OneBusAway project. If we add that feature in this project we'd use the same format.

To be clear, though, this bug/issue is about the columns/data that are exported from Firebase into CSV file.

m-y-u commented 3 years ago

@barbeau So, I started writing the test for our string date to long UNIX timestamp conversion method. I did reference the test in the PR: https://github.com/CUTR-at-USF/onebusaway-firebase-export/pull/32#issue-654064236 but the method we'd test in muser-firebase-export is different from the one in the above PR.

The test in the above PR is a unit test that is testing the input parameter data and return data logic. Whereas our function doesn't use any parameter and rather interacts with programOptions class function getStartDate() to get input data through CLI.

Thus, I was thinking that we'd be doing more of an integration test here by testing the data access through CLI and the logic of our conversion function separately (basically splitting this into two parts). I wanted to know your thoughts on this.

barbeau commented 3 years ago

@antcoder007 let's discuss this more in the meeting tomorrow - I don't think we're talking about the same thing.

This issue is regarding the date and times that are output to the CSV file. All the conversion from epoch time to string time should already exist, we just need to make sure the output timezone is set to UTC time instead of Eastern time. Maybe this is already done in this project and no changes are required? I haven't looked at the code myself. I assumed because this issue existed in the original project it probably existed here too.

barbeau commented 3 years ago

Per today's discussion it looks like this project is exporting raw numeric UTC epoch values (i.e., from currentTimeMillis) so this issue doesn't exist in this project.

We will open a separate issue for adding another column for human-readable event times.