earthspecies / cornell-birdcall-competition-starter-pack

Cornell Birdcall Identification (a Kaggle competition) starter pack
Apache License 2.0
53 stars 14 forks source link

01c_melspectrogram_dataset_for_pool_shortish.ipynb references data/shifted before its created #2

Open bfeeny opened 4 years ago

bfeeny commented 4 years ago

The last two cells of the notebook are:

mkdir -p data/npy/shifted

for recording in Path('data/shifted/').iterdir():
    x = sf.read(recording)[0]
    x = audio_to_melspec(x).astype(np.float32)
    np.save(f'data/npy/shifted/{recording.stem}.npy', x)

Yet data/shifted has never been populated if you run notebooks 00, 01, 01a, 01b, 01c

In 01b the MelspecPoolWithShiftedDataset is created and has the ability to return files read from data/shifted but yet it's not created or used yet. Then at the end of 01c an attempt is made to iterate through data/shifted to produce shifted spectrograms in data/npy/shifted.

radekosmulski commented 4 years ago

hi @bfeeny!

First of all, thank you very much for your willingness to familiarize yourself with this work. I am very sorry that the repo is in a bit of a disarray - things got a bit out of hand when I was working on it, tried too many things - should have put more energy into making it all cleaner and clearer.

The shifted recordings are not ones that I created myself from the competition data. They are soundscape recordings shared by one of the organizers but coming from a different competition. There is a discussion about this on kaggle here along with the download link.

Could you please see if this data will work for you?

This is what the directory looks like for me: image

Apologies again for the trouble

radekosmulski commented 4 years ago

Just to add to the above, by shifted data here I meant domain shifted, which might give more background on the naming.