clinicalml / TabLLM

MIT License
265 stars 42 forks source link

How can I apply it to my own data set? #23

Open ANERIZ opened 4 months ago

ANERIZ commented 4 months ago

HI@stefanhgm My current dataset contains test data and diagnostic reports for 3,000 patients. The number of rows of test data is not uniform. Each test data is a three-column csv file. The first column of the file is the time, starting from 0 and incrementing every 0.1 seconds. The other two columns are the test content; the diagnostic report is a txt file that contains the patient's examination information and the doctor's diagnosis. Opinion. Each csv file corresponds to one and txt file, I would like to ask how should I process my data set to train your model?

stefanhgm commented 3 months ago

Hello @ANERIZ,

thanks for using TabLLM!

Basically, you need a serialization of your data into text to use it with TabLLM. We provided some methods for tabular data in our paper and offer them in our code. It sound like your data offers some tabular structure, so you could probably try one of the default serializations.

I think the easiest way to start with your own dataset is to run the project for one of the provided datasets and then replace all files with your own dataset.

I hope that helps!