EricArcher / banter

banter is a package for creating hierarchical acoustic event classifiers out of multiple call type detectors.
9 stars 0 forks source link

Blank output when using predict() #4

Closed sfregosi closed 1 year ago

sfregosi commented 1 year ago

Hi Eric,

Moving my question over on BA Stack Exchange here because I made some progress but am stuck again now...

My first issue was that I was not loading the rfPermute package but when I did that I am now able to successfully run:

score <- predict(bant.mdl, dets_banter)

but score ends up blank.

I am working with a model that was made from acoustic features extracted with an older version of PAMpal, to predict acoustic features extracted with the latest version of PAMpal. That led to one issue with feature column names not matching perfectly, but I was able to manually fix that. From what I can tell to-be-predicted events (exported for banter by PAMpal) are in the right format.

In trying to examine my bant.mdl to make sure it looks ok, I wasn't able to use the banter::summary function. That throws the below error. I suspect this is from ths update but maybe you can confirm.

Error in FUN(X[[i]], ...) : 
  no slot of name "timestamp" for this object of class "banter_detector"

Thank you! Selene

sfregosi commented 1 year ago

Also, I'm emailing you my model and detections as a combined .rda now!

EricArcher commented 1 year ago

The error is due to the fact that the detector models in your banter model object do not have @timestamp slots in them. From the code, I can't work out why this should be. The example code works, so I know that in normal operations, these slots get filled correctly.

Can you show me the code you used to create these detector models? Also, did you happen to create these models with a version of banter prior to when I added the @timestamp slot?

sfregosi commented 1 year ago

Ok, I think I had two separate issues going on and I think I have resolved them.

1) timestamp error - yes! This was an old model created in Apr 2021. I tried with a newer model updated in Jan 2023 and was able to display a full summary with plots, details etc. But, still had the empty score output with the newer model (had been having that initial issue with trying several models).

2) empty score output error - My dets_banter input object, exported after using PAMpal::processPGDetections and mode = 'time' had 'NA' for all the species ID. If I manually updated all of these to 'UO', 'XX', or any other actual character string (used dets_banter$events$species = 'xx'), then I get scores!

I did receive a warning at the PAMpal::export_banter step that says - Events 1-6 do not have a species ID. Data can only be used for prediction, not model training. but I ignored that since I was only trying to do prediction.

I did initially have manual species guesses for my events, but they were 'Pm', 'UO', and 'UBW'. If I left the species column in my event .csv that I had originally, the 'Pm' and 'UBW' events would get automatically removed with the following warning Species UBW do not have enough events to train a banter model (min 2), these will be removed so I removed my 'guesses' so all events would go through to the prediction step.

I'm looping @TaikiSan21 in now because maybe this is something that could/should be updated in PAMpal? If there is an easy way to allow for the species with just a single event to not be removed (if you know you are only doing prediction), or to change the 'NA' output to a different output type, that would be super helpful. But also, it is very easy to just add a column of X's for all my species in that scenario :)

TaikiSan21 commented 1 year ago

export_banter has a training argument, default is TRUE. If you instead do export_banter(data, training=FALSE) it should export all your data without complaining about the "not enough events" stuff.

The prediction score thing not working unless you put in a dummy variable is weird, but I think I figured it out. Since the PAMpal default is to put a placeholder NA as the species ID until it gets set, this is what gets exported. This is causing problems in banter::predict here because any row with any NA value is getting dropped. @EricArcher, do you think this line should get modified to exclude the species column from the NA checking? I could imagine a scenario where a user only has labels for some of their data, so the species column would be a mix of NA and actual labels.

sfregosi commented 1 year ago

aha! Of course the training argument is already an option. Thanks for pointing me to that and I'll incorporate that into my regular 'predict-only' flow.

EricArcher commented 1 year ago

It looks like it would suffice to have banter::predict drop the species column altogether at the beginning of the function. If that sounds good to ya'll I'll do it and push up a new version.

EricArcher commented 1 year ago

Belay that last. That won't work. Taiki's solution of having the na.omit line ignore the species column is better

EricArcher commented 1 year ago

There's a secondary problem: banter::predict assumes that if the species column is present, that you want to do a comparison between assumed species and predicted species IDs, adding a column to the $predict.df element of the output identifying if that comparison is correct. If species is NA, the output of that is weird. I'l edit it to make the results of all of that NA.

EricArcher commented 1 year ago

Edits have been pushed. @sfregosi would you give it a shot?

sfregosi commented 1 year ago

Thank you, both! All looks to be working great.

Tried with no manual 'species' column in my detection events csv (export_banter gives NAs for all events) - got prediction scores and the original and correct columns both are populated with NAs (as expected!)

Tried with manually ID'd species column in detection events (but exported for banter with training = FALSE) - got prediction scores and original and correct columns with the original entries and TRUE/FALSE as expected!

EricArcher commented 1 year ago

Excellent! Let me know if you run into anything else.