UMAP+HBDSCAN model Index error when preprocessing csv during training

gongziyida commented 4 years ago

The following error occurred in the middle of training when preprocessing one of the csv files. In the print-out, 1/6 preprocessing was done before the error occurred. Do you have any direction on how should I fix the error? Thanks!

Traceback (most recent call last):
  File "run.py", line 6, in <module>
    bsoid_umap.main.build(TRAIN_FOLDERS)
  File "/ihome/nurban/zig9/B-SOID/bsoid_umap/main.py", line 28, in build
    nn_assignments = bsoid_umap.train.main(train_folders)
  File "/ihome/nurban/zig9/B-SOID/bsoid_umap/train.py", line 231, in main
    filenames, training_data, perc_rect = bsoid_umap.utils.likelihoodprocessing.main(train_folders)
  File "/ihome/nurban/zig9/B-SOID/bsoid_umap/utils/likelihoodprocessing.py", line 139, in main
    filenames, data, perc_rect = import_folders(folders)
  File "/ihome/nurban/zig9/B-SOID/bsoid_umap/utils/likelihoodprocessing.py", line 72, in import_folders
    curr_df_filt, perc_rect = adp_filt(curr_df)
  File "/ihome/nurban/zig9/B-SOID/bsoid_umap/utils/likelihoodprocessing.py", line 116, in adp_filt
    if rise_a[0][0] > 1:
IndexError: index 0 is out of bounds for axis 0 with size 0

runninghsus commented 4 years ago

Hi @gongziyida

Are you on slack? It might be easier to communicate via slack with respect to individual data issues.

https://join.slack.com/t/b-soid/shared_invite/zt-dksalgqu-Eix8ZVYYFVVFULUhMJfvlw

runninghsus commented 4 years ago

High-pass filter problem due to all around low likelihood in that file (array([162, 144, 101, 39, 31, 17, 12, 6, 2, 1]). The likelihood counts never rose up so cannot determine threshold. Solved with changing lines 72-76 in likelihoodprocessing.py to:

            try:
                curr_df_filt, perc_rect = adp_filt(curr_df)
                logging.info('Done preprocessing (x,y) from file {}, folder {}.'.format(j + 1, i + 1))
                rawdata_li.append(curr_df)
                perc_rect_li.append(perc_rect)
                data_li.append(curr_df_filt)
            except:
                pass

YttriLab / B-SOID

UMAP+HBDSCAN model Index error when preprocessing csv during training #10