Closed danuta-w closed 1 year ago
Hi Danuta, i am not aware about your data structure, but i have a guess what might went wrong. ANIMAL-SPOT internally takes the following filename structure "label_id_year_tape_startlabeltime_endlabeltime" ... Based on the "Year and Tape" information it internally creates a set of "recording tapes" based on the given data. A recording tape is always the comination between year and tapename. When ANIMAL-SPOT is doing the data split (automatically) it makes sure that NONE of the tapes are shared across partitions, in order to avoid "cheating", e.g. audio data from the same tape, distributed across training and test, makes it easier for the model, because it has already seen the data during training. So, and i think this is your problem. Very likely the amount of different tapes (in your case) is not much, so ANIMAL-SPOT puts the stuff either in one of the buckets but nothing is left for the remaining buckets. In case you dont have more different tapes and everything comes e.g. from one recording, you can also "fool" ANIMAL-SPOT by naming the "year_tape" information in an artificial random way, to simulate different recording tapes. That should solve your problem
Hi Christian,
Thank you for getting back to me so quickly. My data are indeed from long deployments of acoustic tags on penguins, and even though they are saved in shorter chunks, to synchronize the audio recordings with data from the tag's other sensors, I use a single deployment ID and time re start of deployment. I will try renaming the files and giving the training another go.
Thanks again for your help, Danuta
On Tue, May 23, 2023 at 6:18 PM ChristianBergler @.***> wrote:
Hi Danuta, i am not aware about your data structure, but i have a guess what might went wrong. ANIMAL-SPOT internally takes the following filename structure "label_id_year_tape_startlabeltime_endlabeltime" ... Based on the "Year and Tape" information it internally creates a set of "recording tapes" based on the given data. A recording tape is always the comination between year and tapename. When ANIMAL-SPOT is doing the data split (automatically) it makes sure that NONE of the tapes are shared across partitions, in order to avoid "cheating", e.g. audio data from the same tape, distributed across training and test, makes it easier for the model, because it has already seen the data during training. So, and i think this is your problem. Very likely the amount of different tapes (in your case) is not much, so ANIMAL-SPOT puts the stuff either in one of the buckets but nothing is left for the remaining buckets. In case you dont have more different tapes and everything comes e.g. from one recording, you can also "fool" ANIMAL-SPOT by naming the "year_tape" information in an artificial random way, to simulate different recording tapes. That should solve your problem
— Reply to this email directly, view it on GitHub https://github.com/ChristianBergler/ANIMAL-SPOT/issues/3#issuecomment-1559763234, or unsubscribe https://github.com/notifications/unsubscribe-auth/AID6EJ6YYRJJUR6KUPAFYX3XHTPMJANCNFSM6AAAAAAYKONRZQ . You are receiving this because you authored the thread.Message ID: @.***>
Hi again,
The training works with the randomized tape names. Thanks again!
Best, Danuta
Hi,
when setting up training for an Animal-Spot binary classification I am presented with a weird error. The dataset seems to not be split according to specified values in
main.py
. As you can see in the error messages below, the training set contains 0 files whereas the validation and test split contain the remaining files.When I run the script multiple time, it is random whether the train, val or test dataset is omitted. In all re-runs one of the categories contains 0 files which results in the error below.
What can I do?
Greetings, Danuta