MIT-LCP / mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
https://mimic.mit.edu
MIT License
2.52k stars 1.51k forks source link

Setting up MIMIC-IV in Ubuntu Server 22.04 #1317

Closed krishnavemuri closed 2 years ago

krishnavemuri commented 2 years ago

Prerequisites

Description

I am unable to set up the MIMIC-IV database on my personal server, running Ubuntu 22.04. I have a 4TB hard drive, and from running the SELECT pg_size_pretty( pg_database_size('dbname') ); command, I found that the MIMIC-IV database occupies ~59GB of storage space, so storage is not an issue.

Description of the issue, including:

I have downloaded the zip files for the CORE, ICU and HOSP modules for MIMIC-IV, and am attempting to set up the database following the instructions in the buildmimic/postgres script. When running the line psql -v ON_ERROR_STOP=1 -v mimic_data_dir=<INSERT MIMIC FILE PATH HERE> -f load.sql, I am running into the following error:

COPY 562892
psql:load_gz.sql:71: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
PANIC:  could not write to file "pg_wal/xlogtemp.68479": No space left on device
CONTEXT:  COPY chartevents, line 285508226
psql:load_gz.sql:71: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
PANIC:  could not write to file "pg_wal/xlogtemp.68479": No space left on device
CONTEXT:  COPY chartevents, line 285508226
invalid socket
psql:load_gz.sql:71: fatal: connection to server was lost

I am relatively new to working with Postgres, so any help would be invaluable. I have not come across a similar issue here, and searching on Google did not bring up anything similar either. Please let me know if you need more information on my set up. Thanks!

marymlucas commented 2 years ago

Hi, I have set up MIMIC-IV on both Ubuntu Server and Windows and have not run into this error, so I can't really help with this specific issue. It seems to be related more to the log file getting full. Have you checked the log files to see what errors are being logged?

On Windows I did run into a few weird errors loading the data and my solution was to run the code manually, loading each table one by one, as it helped me understand and debug the errors easily. Perhaps you could try that if you're still having issues? It IS definitely more time consuming and hands on but you only ever have to do it once.

Also, random potentially silly question, have you set the correct path to the data in

psql -v ON_ERROR_STOP=1 -v mimic_data_dir=<INSERT MIMIC FILE PATH HERE> -f load.sql

krishnavemuri commented 2 years ago

Hi Mary, thanks for your reply. The error I was running into was related to the Logical Volume Manager in my server installation. I do not understand the specifics of the issue, but broadly, when I installed my server, I accepted the default settings, which limited the space available on my logical volume to 98 GB, even though my hard drive has a storage capacity of 4TB. This was what led to the "PANIC: No space left on device" message while trying to set up the MIMIC IV database.

I used the advice on this Ask Ubuntu page to extend my logical volume to use the full available storage space: https://askubuntu.com/questions/1106795/ubuntu-server-18-04-lvm-out-of-space-with-improper-default-partitioning

And then tried setting up the MIMIC -IV database, and that worked without any issues.

I am not sure if this is helpful, but the one feedback I have for your team is to have some information on the storage and computing requirements if setting up MIMI on a local server instance. I tried to look for this information on the MIMIC site once I ran into the issue, to see if I was exceeding them, but I could not readily find it anywhere. Another useful error message might be to check for system requirements before commencing the set up of the database, through, for example, a call to du -sh before initiating loading the tables, and then throwing an error message if the available space might not be adequate.

marymlucas commented 2 years ago

Good to hear that you figured it out. I hadn't considered the server installation issue, makes sense.

I'm just an ordinary user like you, not part of the team, but I'm sure they'll welcome your feedback :)

krishnavemuri commented 2 years ago

Sorry, my bad, I thought you were replying on behalf of the team! In any case, I hope they get to see the message and hopefully have a more coherent response.

Regards, Krishna Karthik Vemuri, MPH (Epi) Environmental Medicine and Public Health Icahn School of Medicine at Mount Sinai, 17 E 102nd Street, New York, NY - 10029. Cell: 612-306-8881.

On Tue, Jun 14, 2022 at 4:24 PM Mary M Lucas @.***> wrote:

Good to hear that you figured it out. I hadn't considered the server installation issue, makes sense.

I'm just an ordinary user like you, not part of the team, but I'm sure they'll welcome your feedback :)

— Reply to this email directly, view it on GitHub https://github.com/MIT-LCP/mimic-code/issues/1317#issuecomment-1155677127, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACTBY2LRDPOJXEHXPXX2MXTVPDS7VANCNFSM5YIJKH3Q . You are receiving this because you modified the open/close state.Message ID: @.***>

tompollard commented 2 years ago

Thanks Mary and Krishna! We see your messages, though have less time than we would like to provide support, work on documentation issues etc. Most/all of our team now work with the cloud instances of MIMIC, so we rarely need to go through the build process locally, which is perhaps why the documentation in this area has fallen behind a little.

I would highly recommend working with the Google BigQuery instance wherever possible. This is very convenient and avoids the headache of building the database, updating the database, etc. You can find some example notebooks using the cloud instances of our databases at: https://github.com/MIT-LCP/sccm-datathon

Note that you would need to replace the project ID in the documents with your personal ID (which is used for billing purposes). See https://mimic.mit.edu/docs/gettingstarted/cloud/ for more information.

marymlucas commented 2 years ago

@tompollard Thank you for your reply, really appreciate this resource and all the help your team provides. This is an unrelated question but have you done or do you have planned any datathons using MIMIC-IV? I've struggled a fair bit to replicate some of the MIMIC-IV studies with the main issue being the process of building the cohorts. Or do you happen to have any similar notebooks for building cohorts and running studies with MIMIC-IV?

tompollard commented 2 years ago

@marymlucas We have been running significantly less workshops and datathons over the past couple of years, but we're beginning to become more active again! If you drop me an email at tpollard {at} mit.edu then I can send across a list of planned events.

If you take a look through the repositories in https://github.com/MIT-LCP/ you'll find several notebooks that may be helpful (though some are likely to be a bit outdated). For example, see:

@alistairewj also has some nice ones on his personal GitHub account (e.g. https://github.com/alistairewj/reproducibility-mimic).

alistairewj commented 2 years ago

Thanks for the praise @tompollard ! I've been working on a few tutorials with MIMIC-IV and hopefully I'll publish those soon, they're just going through a bit of a trial run right now. In the interim you could check out the mimic-iv-aline-study repo which fully reproduces the aline study in MIMIC-IV in a colab notebook, with the help of a few queries.

marymlucas commented 2 years ago

@alistairewj Thank you. I look forward to the tutorials when you publish them! I've just found the mimic-iv-aline-study repo and will work through that in the meantime! Much appreciated.

Thanks for the praise @tompollard ! I've been working on a few tutorials with MIMIC-IV and hopefully I'll publish those soon, they're just going through a bit of a trial run right now. In the interim you could check out the mimic-iv-aline-study repo which fully reproduces the aline study in MIMIC-IV in a colab notebook, with the help of a few queries.