updates - Githubissues

bloodraven66 commented 2 years ago

Discussions

[ ] Add plotting functionalities, @varmaalok22 do you have any particular visualiations in mind? You can let me know or feel free to add.
[ ] any default hyperparameter value to be changed?
[ ] what features could be added for prediction?

Todo

[ ] fine tuning on pretrained model
[ ] training on existing model architecure
[ ] support new models
[ ] add a datasetclass
[ ] allow custom datasetclass
[ ] allow custom dataloader

varmaalok22 commented 2 years ago

[x] All models should have the same keys in the output. The '3DNN' has 'class', 'post0' and 'post1', whereas the 'Seq_CNNLSTM_1sec' has 'predicted_class', 'class0_posterior_score' and 'class1_posterior_score'. I like 'predicted_class', 'posterior0' and 'posterior1'.
[ ] If given a time series of time stamps, is there a way to get the timestamps at which the prediction was made, to match the network type used? This would help a lot when trying to compare things with ground-truth.
[ ] For visualization of the prediction results, we should have two options - one to view all the traces individually and have it interactive so that we can cycle through them manually for inspection, and another where all the outputs are stitched together into one long time series (which is useful for representative plots).
[ ] During the training of new networks, of course we'd like to visualize the evolution of the training, cross-validation and testing accuracies. Other than that, I can't really think of anything super important, at the moment.

bloodraven66 commented 2 years ago

If given a time series of time stamps, is there a way to get the timestamps at which the prediction was made, to match the network type used? This would help a lot when trying to compare things with ground-truth.

Do you mean on specifying a segment from the full sequence?

varmaalok22 commented 2 years ago

Yes, I mean can we get the subset of indices that match the neural network's subsampling?

bloodraven66 commented 2 years ago

Sure. the sequence is at 30Hz right?

varmaalok22 commented 2 years ago

Yes, but I was wondering if we can have that as an input. Can we have the experimenter input a sequence of time stamps and the data - in case the data are not sampled at 30Hz, an interpolation is performed, and then the data is inputted to the neural network, and finally, the network gives out both the final timestamps and the predictions+posterior scores at those timestamps.

bloodraven66 commented 2 years ago

alright. what would be the best way to interpolate the data?

varmaalok22 commented 2 years ago

Linear interpolation is perfectly fine - there's no need to get splines and stuff into this. This is what I have done and seems to work, in any case.

varmaalok22 commented 2 years ago

Is there a way to suppress this kind of output when making predictions? It is interfering when I'm trying to make the interactive plot.

0%| | 0/11 [00:00<?, ?it/s] 9%|▉ | 1/11 [00:00<00:01, 9.63it/s] 100%|██████████| 11/11 [00:00<00:00, 58.70it/s]

bloodraven66 commented 2 years ago

Is there a way to suppress this kind of output when making predictions? It is interfering when I'm trying to make the interactive plot.

0%| | 0/11 [00:00<?, ?it/s] 9%|▉ | 1/11 [00:00<00:01, 9.63it/s] 100%|██████████| 11/11 [00:00<00:00, 58.70it/s]

There is a paremeter called progressbar tvb_handler = TVB_handler("3DNN", progressbar=False)

If you want to disable logging output as well tvb_handler = TVB_handler("3DNN", progressbar=False, use_logger=False)

bloodraven66 commented 2 years ago

If you are trying out changes yourself, best way is to uninstall the package first , pip uninstall tvb Then install inside the cloned repository, pip install -e . Pushing to github will take some time to reflect the changes

varmaalok22 commented 2 years ago

Thanks for telling me the necessary kwargs. I'm doing all the work in Google Colab for now, because it's easy to share with others, if needed.

bloodraven66 commented 2 years ago

Linear interpolation is perfectly fine - there's no need to get splines and stuff into this. This is what I have done and seems to work, in any case.

I've added this. Can you please check that it is working as intended?

Also, for training, how should the data splits be defined?

the user provides a 2D sequence data & 1D labels for it and the code automatically creates balanced splits through a controllable ratio
user provides train, dev, test splits explicitly
both

Should it automatically do k - fold?

I'm also having fine tuning as a separate module (I can have some additional configurations). Should the same things from train apply here?

varmaalok22 commented 2 years ago

I've added this. Can you please check that it is working as intended?

Give me some time - I'll check out the linear interpolation. How is it implemented?

Also, for training, how should the data splits be defined?

the user provides a 2D sequence data & 1D labels for it and the code automatically creates balanced splits through a controllable ratio user provides train, dev, test splits explicitly both Should it automatically do k - fold?

I think it is best to have something that automatically creates balanced splits of k-folds. If needed sometime later in the future, option 2 can be done. But I can't imagine any scenario in which someone would want to explicitly specify train and test splits. For minimum bias, I think option 1 is best.

I'm also having fine tuning as a separate module (I can have some additional configurations). Should the same things from train apply here?

I'm not sure I understand what you mean by this. Could you please clarify?

bloodraven66 commented 2 years ago

I'm not sure I understand what you mean by this. Could you please clarify?

By fine tuning I mean, we will first start with a pretrained model (the ones available by default or could be a user trained) and then train with it's weights on user data. So just check if k-fold for this training is okay as well?

varmaalok22 commented 2 years ago

Yes, I think k-fold for this training is okay, too. It's best to have a somewhat automated way of balancing out the classes, to reduce bias in any of the training folds.

bloodraven66 commented 2 years ago

I have added sample data inference.

I have also changed the how data is stored internally. from 2D array to dict ( dict ( data, label ))

: <'filename1'>: [data, label] <'filename2'>: [data, label] : ... The input remains the same to the user. Right now it only supports dataset_name1 with the exp_name param. Input data will be 2D for sequence as well as label. a filename param is also present which should be == len(data) containing filename1, filename2, etc.. if nothing is providid, it names it as 1, 2, 3 .. for the same data, it uses the 3 sets as dataset_name1, dataset_name2, dataset_name3. I don't have proper support for metrics from sample dataset yet since it is a label sequence. I'm working on adding training, it should be easier now with the internal dtype changes

bloodraven66 / CaMLsort

updates #1