deepsound-project / samplernn-pytorch

PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
MIT License
288 stars 75 forks source link

Why is overlap length in dataloader needed? #25

Open williamFalcon opened 6 years ago

williamFalcon commented 6 years ago

Again, great project guys, good implementation!

I had some inline questions from the dataloader file. Not understanding why it's doing the segmentation it's doing.

# dataset.py    

# A) what is this??? 
# B) why? 
# C) Is this related to the number of samples per frame for tier 3?
self.overlap_len = 64     

# length of music clip   
n_samples = 128064    

# desired seq size we want for each tng example
# D) why did you pick 1024? 
self.seq_len = 1024   

# iterate the full song 1024 units at a time    
for seq_begin in range(self.overlap_len, n_samples, self.seq_len):
    # 0 in first loop
    from_index = seq_begin - self.overlap_len  

    # 1088 in first loop.
    # E) Why not 1024? 
    # F) what is the overlap?
    to_index = seq_begin + self.seq_len   

    # (128 x 1088)  
    sequences = batch[:, from_index : to_index]   

    # G) why is this dropping off the last sample??
    input_sequences = sequences[:, : -1]   

    # H) why is the label such an odd subset? 
    target_sequences = sequences[:, self.overlap_len:].contiguous()   

    # I) Is X not trying to predict the next sequence making that missing chunk Y?
    # ie: full_seq = [1,2,3,4,5,6].   X = [1,2,3,4].   Y = [5, 6]?
    # currently this is not how the data are laid out.   
    yield (input_sequences, reset, target_sequences)    

Thanks! @koz4k

rohan1561 commented 4 years ago

C) yes, it is equal to that. E) they're probably trying to generate the entire song so they use a dummy input for the first time step in all rnn tiers. This also answers H. Every part of the actual song is in the target sequence. The very first frame for every tier is generated from dummy inputs. G) Because the last sample is generated from the sample level mlp