SkywardAI / paper_gallery

Papers gallery for using LLMs ability over dataset
MIT License
1 stars 0 forks source link

Implement interp-net data preprocessing #7

Closed Aisuko closed 1 month ago

Aisuko commented 1 month ago

We want to implement data preprocessing step of interp-net. The original code was in-efficient. So, let's implement only 100 records as a demo on Kaggle and related to this issue. @Micost

@wangyuweikiwi she will use that notebook as an example and continue to work.

There are two files we need to carefully deal with inside load_data function. Please check the notebook was done by me.

So, you can see that the adm_type_los_mortality.p file is here

However, the vitals_records.p is so large size, and there is no reason we need to load all the data due to we already have tons of in-efficient code from that project. So, I split them to every 5000 records a checkpoint.

Note: vitals_records_1000.p and vitals_records_2000.p these two files are the smaller batch examples for me testing multi-processing code, please ignore them.

Micost commented 1 month ago


Please take a look at this note

I update the loading process. Since the vital file has already sliced into several files. Please increment value batch_idx if you want to load the splited file one by one.