Baselines and Evaluation Scripts

baeseongsu / ehrxqa

EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

MIT License

57 stars 3 forks source link

Baselines and Evaluation Scripts #4

Closed wjhou closed 3 months ago

wjhou commented 5 months ago

Hi,

Thanks for your contributions. This is an impressive work.

I am trying to reproduce the results mentioned in the paper. Could you provide some baseline implementations and the scripts for evaluation in the paper?

Best, Ethan

baeseongsu commented 5 months ago

Hello, @wjhou !

Thanks for your interest in our work! We're currently in the process of preparing the baseline implementations and the corresponding evaluation scripts. The code needs some cleaning to remove hard-coded parts and to ensure it's user-friendly. We'll let you know as soon as it's available.

Best, Seongsu

wjhou commented 5 months ago

Hi Seongsu,

Appreciate your efforts in constructing the dataset. I realized there was a lot of dirty work in it when I was doing the same thing. :joy:

I successfully generated the JSON data following the guidelines.

I encountered an error attributeError: 'BlockManager' object has no attribute 'arrays' pandas version and managed to fix it by updating to the latest Pandas.

Best, Ethan

baeseongsu commented 5 months ago

Hi, Ethan (@wjhou),

Thank you for troubleshooting the Pandas error. Our current Python environment might have some vulnerabilities during preprocessing, which cannot support all versions of packages. I'll check the environmental settings and update them in a flexible way to maintain the environment (e.g., using Docker). Also, to avoid manual data preprocessing, we plan to upload our dataset and database to the Physionet platform, but I guess it will take some time due to the review process.

I definitely believe there are still some errors in the dataset even though we spent a lot of time on data processing. Let me know if there's anything else we need to modify or if you have any further insights from your work on the dataset.

Have a great day 👍

Best, Seongsu

baeseongsu commented 4 months ago

Hi Ethan (@wjhou),

We have refined our baseline and evaluation scripts to reproduce our experimental results. While the results are similar, they are not exactly the same due to variations in the prompts used.

However, I wanted to let you know that we are currently holding an internal challenge using this dataset. As a result, we plan to release the NeuralSQL baseline after June. To allow you to check our performance immediately, I will send you our code through a separate email.

Thank you for your patience and understanding.

Best, Seongsu

wjhou commented 4 months ago

Hi Seongsu,

Appreciate it. I will test the code once I receive it.

Best, Ethan