TimeViewers / signalworks

MIT License
3 stars 2 forks source link

tracking.Partition.create() returns bogus first initial time value #21

Open j9ac9k opened 4 years ago

j9ac9k commented 4 years ago

Description

When creating a partition object wtih a non-zero initial value, an incorrectly time-marked partition object.

What I Did

>>> data = [(1.155, 1.515, 'call')]
>>> partition = tracking.Partition.create(data, fs=48000)
>>> partition.time
array([    0, 72720])
>>> partition.value
array(['call'], dtype='<U16')
j9ac9k commented 4 years ago

Problem is in these lines of code

https://github.com/lxkain/signalworks/blob/master/signalworks/tracking/partition.py#L246-L248

        if time[0] != 0:
            logger.warning("moving first label boundary to zero")
            time[0] = 0

Correct behavior here is probably to insert an empty partition to the first label.

lxkain commented 4 years ago

First of all, I think that the create() should be merged with the read() method - they are both essentially doing the same thing.

Second, I personally think there should be an error if the first time is not zero. Most likely the user is confusing track.Label with track.Partition.

@tuanad121 is the author of create() - what do you think? What was your use-case for creating this in the first place?

j9ac9k commented 4 years ago

Forgot to close this issue when I merged my PR.

Our need came from wanting to create a partition track, but our source data was not encompassed by a single file, but was bundled in part of the file.

We effectively have a python data structure of

data: List[Tuple[float, float, str, float]] = [(startTime, endTime, label, score)]

The problem is that our data-source, does not always have the first entry in the list have a start time of 0, and silently moving the first start time to 0 (what the behavior was in signalworks =< 0.2.1) caused an unintentional behavior in our case.

Also in our case, a Partition track is definitely what we want (as opposed to a label), but it's just that I could not guarantee first entry would start at 0. I certainly could have extra checks in our application to create an empty segment if my first start time is not 0, but silently making the first time be 0 seemed very-much like non-ideal behavior, so I decided to make the change upstream.

In terms of Partition.create vs. Partition.read, I'm good with trying to merge them, I always thought of .read() as reading in a file, converting the serialized data into List[Tuple[float, float, str, <whatever>)], and then calling the create method... I'm certainly open to another suggestion.

I'll leave this issue open to keep the conversation going, but with signalworks 0.2.2 I already made the change.

lxkain commented 4 years ago

I think your changes are fine, but how about calling it Partition.from_list() instead of Partition.create()? Seems more descriptive.

j9ac9k commented 4 years ago

I'm good with from_list() instead of create().

Alternatively we could make the data be part of the __init__ method? That way if you do Partition.read and pass along a file, it returns the result of Partition.__init__() with the List[Tuple[]] as an input argument.

lxkain commented 4 years ago

So __init__ would either take time and value or data in that particular format? That would make it different from all the other initializer methods. I'd prefer to stick with from_list.