Closed ctuguinay closed 6 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 88.68%. Comparing base (
95cb3e1
) to head (fbd97b7
). Report is 1 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@leewujung Thank you for the comments! I'll respond to your longer comments below:
"Dictionary for transect values" does not make sense to me. Do you mean annotation labels of "log" type? Could you re-word this to be more precise?
I wasn't quite sure what to call them since they denote what part of the transect we are on, and log
is already a designated class name. Maybe I instead call them transect_sequence_type
or something to denote where the EVR region is in the context of the transect map?
is very hard-coded, so I am not sure why the function wouldn't just take transect_strs and transect_next_allowable as inputs directly. Doing that may help users understand what these strings are used for.
Yeah, I can use that convention instead.
could be factored out to be a function to produce warnings (or not) so that the main flow of the transect_mask function is cleaner.
Maybe I separate the transect masking and the checking of the transect sequences?
The 2 cases (start_str followed by break_str or end_str; and resume_str followed by break_str or end_str) are exactly identical to what you have in the transect_next_allowable dictionary, under the keys start_str and resume_str -- and this is obviously should be the case, because that's why these are allowed combinations. How about just using the transect_next_allowable dict to set the within_transect values?
Yup, that sounds cleaner.
Since transect_df["within_transect"] is already a clean column of where in transect_df should be within or outside of a transect segment, and here we only want the "within_transect=True" case, I think you can take out the loop completely.
By the above, do you mean remove this?
for transect_querying_file_name in transect_querying_list:
within_transect_df = transect_df.query(
f'file_name == "{transect_querying_file_name}" and within_transect == True'
)
So instead I just iterate through transect_df within_transect True values only?
I added back the numbers so that I can reference these more easily:
"Dictionary for transect values" does not make sense to me. Do you mean annotation labels of "log" type? Could you re-word this to be more precise?
I wasn't quite sure what to call them since they denote what part of the transect we are on, and
log
is already a designated class name. Maybe I instead call themtransect_sequence_type
or something to denote where the EVR region is in the context of the transect map?
Sounds good
is very hard-coded, so I am not sure why the function wouldn't just take transect_strs and transect_next_allowable as inputs directly. Doing that may help users understand what these strings are used for.
Yeah, I can use that convention instead.
Sounds good.
could be factored out to be a function to produce warnings (or not) so that the main flow of the transect_mask function is cleaner.
Maybe I separate the transect masking and the checking of the transect sequences?
I think you can have a new function for checking the transect sequences to produce warnings (or not), but call this function only within the current transect_mask
. The checking is very specific (and so is transect_mask
) that I don't see a lot of values in separating these as 2 functions that can be used independently in general.
The 2 cases (start_str followed by break_str or end_str; and resume_str followed by break_str or end_str) are exactly identical to what you have in the transect_next_allowable dictionary, under the keys start_str and resume_str -- and this is obviously should be the case, because that's why these are allowed combinations. How about just using the transect_next_allowable dict to set the within_transect values?
Yup, that sounds cleaner.
👍
Since transect_df["within_transect"] is already a clean column of where in transect_df should be within or outside of a transect segment, and here we only want the "within_transect=True" case, I think you can take out the loop completely.
By the above, do you mean remove this?
for transect_querying_file_name in transect_querying_list: within_transect_df = transect_df.query( f'file_name == "{transect_querying_file_name}" and within_transect == True' )
So instead I just iterate through transect_df within_transect True values only?
Yeah, that's what I think. And for completeness I added back the second half of this part of comment below:
We've been operating under the assumption that it is expected that
region_df
may contain timestamp ranges that are larger than what's inda_Sv
, and whatever withinregion_df
that fits the "within transect" criteria would be 1s in the output mask. To make this more efficient, we can first filtertransect_df
bracketed by:
region_bbox_left
that is immediately before the first timestamp inda_Sv
region_bbox_right
that is immediately after the last timestamp inda_Sv
and then iterating all rows in
transect_df
.
@leewujung I incorporated all your previous suggestions. This is ready to be reviewed again.
Also, please get to #126 also once this one is merged, so that we can release v0.1.1 soon. It'll be better for the label organization stuff to be able to install from main
, rather than specific branches.
@leewujung Made the corrections and added a few tests.
And you were right, there was a bug where
transect_df.loc[ transect_df["region_bbox_right"] > max_timestamp, "region_bbox_right" ]
and its partner code is empty and so idx min/max cannot be run, and I dealt with this case by taking the min/max idx of the original region bbox values.
This addresses #123.
It no longer checks that ST and ET exist. I made this change in the case where an EVR file doesn't contain all the information for a transect, like if it only contains RT-BT-RT-BT-ET or ST-BT-RT-BT-RT segments.