OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
95 stars 73 forks source link

Consistent time coordinate naming within echopype #656

Closed leewujung closed 2 years ago

leewujung commented 2 years ago

To be compliant with the convention but also recognizing different sonar instruments produce different datagrams that contain the same types of information, we (@b-reyes @imranmaj @lsetiawan @leewujung) spent some time discussing what we should do in each of the instruments echopype supports at the moment. This discussion was spurred by #649.

Below are our conclusions:

Overarching

Time coordinates in Platform group

Time coordinates in Environment group

leewujung commented 2 years ago

@emiliom : assigned you to this for first round of check/feedback and let's see who handles the implementation.

emiliom commented 2 years ago

This looks good

b-reyes commented 2 years ago

While looking at some tests, I noticed that in echopype/tests/convert/test_convert_source_target_locs.py::test_convert_time_encodings we are testing the encodings for the time. After further review I noticed that the dictionary DEFAULT_ENCODINGS in echopype/utils/coding.py does not have a key for time3. Since we are talking about time3 in this issue, should we add "time3": DEFAULT_TIME_ENCODING to this dictionary?

leewujung commented 2 years ago

Since we are talking about time3 in this issue, should we add "time3": DEFAULT_TIME_ENCODING to this dictionary?

Yes please add it.

Also I think for things like this you could take the same approach as what @emiliom mentioned for pushing to others' PR: now you have a sense of how things should be, you could just do it and mention it in your PR comments to call others' attention. 🤓

b-reyes commented 2 years ago

the source of the time base will be noted in the comment attribute of these time coordinate variables.

@leewujung or @emiliom did you have a generic statement that you have in mind for the comment attribute for each of these times? Below is my initial attempt:

leewujung commented 2 years ago

Maybe we can iterate on this at PR review stage? The only comment I have now is to shorten the first (or the only) sentence above to a form like: "Time coordinate corresponding to XYZ."

b-reyes commented 2 years ago

Maybe we can iterate on this at PR review stage? The only comment I have now is to shorten the first (or the only) sentence above to a form like: "Time coordinate corresponding to XYZ."

Ok, sounds good. I will use your suggested form and we can discuss this further at the PR stage.

b-reyes commented 2 years ago

For EK60/EK80 Platform.time1 and Platform/NMEA.time1 seem to correspond to the same thing, but it looks like they can be different lengths. Should we also add a comment for Platform/NMEA.time1 and if so, what should it be?

b-reyes commented 2 years ago

Time coordinates in Environment group

  • we will call the time base time1 across the board, and we don't currently see a use case for time2 yet

After reviewing the Environment group for EK80, I see that we have the time variables ping_time and environment_time. Do we want to leave ping_time alone or do we want to change it to time2? It seems like maybe we incorrectly characterized the times here (please see my comment directly below).

  • other variables, such as temperature, acidity, etc, dimension: (time1)
    • EK80: source environment_time (not yet implemented)

From the code, this does not seem to be correct. These variables have the dimension ping_time.

leewujung commented 2 years ago

@b-reyes : for EK80 under set_env, if you read from the top, you will see that what's called ping_time is actually identical as the later part called environment_time -- always check where things come from in the parser is helpful. I think this is an error from way back and then became a redundancy when other variables were added recently to this group (#616), so please give it a fix. The designation we had above is current.

Note that we do not support the env_only flag anymore. I think removing this and other similar things, such as the one about NMEA (see #580) should be on the TODO list in v0.6.1.

leewujung commented 2 years ago

For EK60/EK80 Platform.time1 and Platform/NMEA.time1 seem to correspond to the same thing, but it looks like they can be different lengths. Should we also add a comment for Platform/NMEA.time1 and if so, what should it be?

I think tracing the code back to the upstream and try a few test files will help resolve this. Please let us know what you find in terms of what operations make them different (if they are) and we can decide on the comment from there!

b-reyes commented 2 years ago

@b-reyes : for EK80 under set_env, if you read from the top, you will see that what's called ping_time is actually identical as the later part called environment_time -- always check where things come from in the parser is helpful. I think this is an error from way back and then became a redundancy when other variables were added recently to this group (#616), so please give it a fix. The designation we had above is current.

Note that we do not support the env_only flag anymore. I think removing this and other similar things, such as the one about NMEA (see #580) should be on the TODO list in v0.6.1.

@leewujung Now that you point to it, I do see that in the parser ping_time and environment_time do use the same values. However, my concern was when we actually set the variable in set_env. In this function, the lines

https://github.com/OSOceanAcoustics/echopype/blob/38c3de4e94ead1f7b2f9c7a0f8c822f80dfa2963/echopype/convert/set_groups_ek80.py#L123-L127

made it seem like there was a scenario where we could have ping_time = some value and environment_time = NaT. Based on your comments, it seems like you are very confident that this will not happen. So, I will happily remove it.

To be absolutely clear, you are suggesting that I remove environment_time as a coordinate and replace it with ping_time wherever it occurs and we will remove the env_only flag later (not in v0.6.0), correct?

leewujung commented 2 years ago

@b-reyes : what I meant was that the ping_time you saw in there is from self.parser_obj.environment["timestamp"], which is what environment_time is set to in the code section you highlighted. Line 125-127 handles the case when this timestamp does not exist (I think this is when there's no environment XML datagram in the file -- @imranmaj could you confirm? I don't immediately see a test file corresponding to this, but maybe I missed something).

As I mentioned, I think the ping_time section in set_env was buggy. I recommend completely ignoring it and implementing what needs to be there: time1 that contains the timestamp(s) from the environment datagram(s) -- there can be one or multiple such datagrams.

And yes, removing env_only does not need to implemented now.

b-reyes commented 2 years ago

For EK60/EK80 Platform.time1 and Platform/NMEA.time1 seem to correspond to the same thing, but it looks like they can be different lengths. Should we also add a comment for Platform/NMEA.time1 and if so, what should it be?

I think tracing the code back to the upstream and try a few test files will help resolve this. Please let us know what you find in terms of what operations make them different (if they are) and we can decide on the comment from there!

@leewujung looking into this, it seems like Platform.time1 and Platform/NMEA.time1 use the same parser object (self.parser_obj.nmea["timestamp"]). However, Platform/NMEA.time1 seems to be a subset of Platform.time1:

https://github.com/OSOceanAcoustics/echopype/blob/38c3de4e94ead1f7b2f9c7a0f8c822f80dfa2963/echopype/convert/set_groups_base.py#L148

https://github.com/OSOceanAcoustics/echopype/blob/38c3de4e94ead1f7b2f9c7a0f8c822f80dfa2963/echopype/convert/set_groups_base.py#L177-L179

It seems like this is a subset because we only want to select times that correspond to certain messages (or NMEA sentences). These allowable sentences are set here

https://github.com/OSOceanAcoustics/echopype/blob/38c3de4e94ead1f7b2f9c7a0f8c822f80dfa2963/echopype/convert/api.py#L254

From the multiple files that I have tested, it seems like Platform/NMEA.time1 is often a subset of Platform.time1. For this reason, I think we need to add a slightly different comment for Platform/NMEA.time1. Perhaps Platform/NMEA.time1.comment = "Time coordinate corresponding to GPS location and allowable NMEA sentences."

Edit:

changed "Time coordinate corresponding to environmental variables and allowable NMEA sentences." to "Time coordinate corresponding to GPS location and allowable NMEA sentences."

leewujung commented 2 years ago

@b-reyes : thanks for the investigative work! I think this messages selection thing is what should be removed (#580 linked above). What do you think if we remove this messages selection here, and in a post-v0.6.0 PR completely remove all related function calls and mechanism? This way we would have identical NMEA and Platform GPS time coordinates, seems cleaner this way (as level 0 product)? and users or our downstream functions can have that selection for level>1 product.

b-reyes commented 2 years ago

@b-reyes : what I meant was that the ping_time you saw in there is from self.parser_obj.environment["timestamp"], which is what environment_time is set to in the code section you highlighted. Line 125-127 handles the case when this timestamp does not exist (I think this is when there's no environment XML datagram in the file -- @imranmaj could you confirm? I don't immediately see a test file corresponding to this, but maybe I missed something).

Thinking out loud ... If no environment XML datagram exists in the file, then we will not have the variables sound_velocity_profile, sound_velocity_source, transducer_name, transducer_sound_speed (for EK80). In the case where an environment XML datagram does exist, then ping_time and environment_time will be the same. Since we do an if statement to check whether or not those variables can be set, it seems like in either case (we don't have an environment XML or we do have an environment XML) it is completely safe to replace environment_time with ping_time. @leewujung is this how you are thinking about it?

As I mentioned, I think the ping_time section in set_env was buggy. I recommend completely ignoring it and implementing what needs to be there: time1 that contains the timestamp(s) from the environment datagram(s) -- there can be one or multiple such datagrams.

Since it is possible that no environment XML datagram exists in the file, how do we want to set time1 in this case? I was thinking the following:

        if "timestamp" in self.parser_obj.environment:
            time1 = self.parser_obj.environment["timestamp"]
        else:
            time1 = np.datetime64("NaT")
b-reyes commented 2 years ago

@b-reyes : thanks for the investigative work! I think this messages selection thing is what should be removed (#580 linked above). What do you think if we remove this messages selection here, and in a post-v0.6.0 PR completely remove all related function calls and mechanism? This way we would have identical NMEA and Platform GPS time coordinates, seems cleaner this way (as level 0 product)? and users or our downstream functions can have that selection for level>1 product.

@leewujung reading up on #580, it seems like we definitely plan to remove this messages section. I think we should create a new issue for this and handle it in a different PR (as I am not sure how many downstream impacts there will be). For this issue, I suggest that we just add Platform/NMEA.time1.comment = "Time coordinate corresponding to GPS location."

leewujung commented 2 years ago

@b-reyes I agree with your suggestions!

emiliom commented 2 years ago

Finally catching up here ... I think I agree with everything I'm seeing, including removing the messages filtering in the NMEA data. That data should be as close to the raw, unfiltered datagram as possible.

leewujung commented 2 years ago

I think we can close this now!