MIT-LCP / wfdb-python

Native Python WFDB package
MIT License
730 stars 298 forks source link

Error when downloading records from the mimic3db database #452

Closed Favourj-bit closed 9 months ago

Favourj-bit commented 1 year ago

I'm trying to download records from the mimic3wdb:MIMIC-III Waveform Database using the code given in the demo

import os
import wfdb

cwd = os.getcwd()
dl_dir = os.path.join(cwd, 'tmp_dl_dir')

wfdb.dl_database('mimic3wdb', dl_dir=dl_dir)
display(os.listdir(dl_dir))

But I keep on getting the error shown: image Please how could I solve this?

tompollard commented 1 year ago

@Favourj-bit please could you provide code to reproduce the issue?

Favourj-bit commented 1 year ago

Hi @tompollard I just did that. I got the same error when trying to access the 'mimic3wdb-matched', 'MIMIC-III Waveform Database Matched Subset' database

tompollard commented 1 year ago

Thanks for adding the content @Favourj-bit. The code in your example works okay for me. Please could you show me the output to the following code?

import wfdb
print(wfdb.__version__)

A quick fix might be to upgrade to the latest version (e.g. with pip install wfdb --upgrade)

Favourj-bit commented 1 year ago

Hi @tompollard here is the output from the code: image

Favourj-bit commented 1 year ago

hi @tompollard, I have tried updating the wfdb but I'm still getting the same error.

NetFileNotFoundError: 404 Error: Not Found for url: https://physionet.org/files/mimic3wdb/1.0/30/3000003/.hea

tompollard commented 1 year ago

thanks @Favourj-bit I'll take a look into this as soon as I have the opportunity. There is a problem with the path that is being generated, so you are getting a 404 not found error.

Side note, but he MIMIC-III database that you are looking to work with is hosted at: https://physionet.org/content/mimic3wdb/1.0/

Favourj-bit commented 1 year ago

Hi @tompollard , I wanted to find out if you have been able to look into this

tompollard commented 1 year ago

I'm sorry, not yet. I have some other commitments that I need to focus on, but will take a look at this when I can (if someone doesn't get there before me).

briangow commented 1 year ago

@Favourj-bit , are you still having trouble with this? I cannot reproduce your problem. The files are downloading properly for me using this code.

Favourj-bit commented 1 year ago

@briangow Yes, I am. Which code is that?

Favourj-bit commented 1 year ago

@briangow

What app are you using? I used jupyter lab for that code

briangow commented 1 year ago

I used Jupyter Notebook, could you give it a try?

Favourj-bit commented 1 year ago

ok, i will do that. thanks

Favourj-bit commented 1 year ago

i just tried with jupyter notebook, gives the same error. I wanted to let you know however that I'm just copying the code for my usecase, I did not clone the notebook. Could that cause an issue? image

briangow commented 1 year ago

You should be able to simply copy the code and have it work, so I don't think that is the problem. To properly debug this we'd need to inspect record_list and nested_records in the dl_database function here: https://github.com/MIT-LCP/wfdb-python/blob/main/wfdb/io/record.py#L2971 . Feel free to give that a try. These lists should point the files here https://physionet.org/content/mimic3wdb/1.0/ (at the bottom). Your error shows a url which doesn't have the filename before the .hea, which isn't correct.

I won't be able help with this again until later next week. If you are anxious to get started using the mimic3wdb files, I'd suggest you download them directly from the physionet.org link above. Keep in mind that these files do take a substantial amount of disk space.

If you don't need to download them locally I'd suggest reading directly from the database (without saving them locally). See this section in the demo.ipynb for an example on how to do this for the matched subset of MIMIC-III waveforms (https://physionet.org/content/mimic3wdb-matched/1.0/):

# Can also read the same files hosted on PhysioNet (takes long to stream the many large files)
signals, fields = wfdb.rdsamp('3269321_0001', pn_dir = 'mimic3wdb/matched/p00/p000878')
wfdb.plot_items(signal=signals, fs=fields['fs'], title='Record p000878/3269321_0001')
display((signals, fields))
Favourj-bit commented 1 year ago

Hi @briangow Thanks so much for your help. i will check out those functions and ensure to inform you of anything I find out. Also, I actually do need the data because I want to preprocess it, I'm doing a blood pressure monitoring system project in my college hence why I need access to ECG, ppg and abp signals. Is it possible to preprocess it without downloading it outrightly?

I will give the code a trial, I will also try to preprocess it if possible. Unfortunately, I could not get the wfdb package working on my system no matter what I tried, hence why I decided to follow the demo provided by the python package.

I'll also try to check out methods to get the data locally. Anyways, I won't mind holding on till next week too in case none of the other methods I'll try works out. Thanks again for your concern

tompollard commented 1 year ago

@Favourj-bit This also runs fine for me! Please could you:

  1. Add details of your operating system (Windows?)
  2. Post the output of pip freeze to show us which packages you have installed.
  3. Post the commands you are running as text, inside three backticks (```)
  4. Post any output that you see as text, inside three backticks (```)
  5. Post the full error message as text, inside three backticks (```)
Favourj-bit commented 1 year ago

Hi @tompollard

  1. Windows 10 Pro
  2. image 3.. ''' import os import wfdb

cwd = os.getcwd() dl_dir = os.path.join(cwd, 'tmp_dl_dir')

wfdb.dl_database('mimic3wdb-matched', dl_dir=dl_dir) display(os.listdir(dl_dir)) '''

  1. The output is very long, so i'm posting only some of it. The beginning and the ending of the output. ''' Generating record list for: p00/p000020/ Generating record list for: p00/p000030/ Generating record list for: p00/p000033/ Generating record list for: p00/p000052/ Generating record list for: p00/p000079/ Generating record list for: p00/p000085/ Generating record list for: p00/p000107/ Generating record list for: p00/p000109/ Generating record list for: p00/p000123/ Generating record list for: p00/p000124/ Generating record list for: p00/p000125/ Generating record list for: p00/p000135/ Generating record list for: p00/p000138/ Generating record list for: p00/p000145/ Generating record list for: p00/p000154/ ''' ''' Generating record list for: p09/p099836/ Generating record list for: p09/p099863/ Generating record list for: p09/p099865/ Generating record list for: p09/p099873/ Generating record list for: p09/p099880/ Generating record list for: p09/p099883/ Generating record list for: p09/p099894/ Generating record list for: p09/p099897/ Generating record list for: p09/p099913/ Generating record list for: p09/p099922/ Generating record list for: p09/p099946/ Generating record list for: p09/p099955/ Generating record list for: p09/p099982/ Generating record list for: p09/p099983/ Generating record list for: p09/p099992/ Generating record list for: p09/p099999/ Generating list of all files for: p00/p000020/ '''
  2. ''' --------------------------------------------------------------------------- NetFileNotFoundError Traceback (most recent call last) in 5 dl_dir = os.path.join(cwd, 'tmp_dl_dir') 6 ----> 7 wfdb.dl_database('mimic3wdb-matched', dl_dir=dl_dir) 8 display(os.listdir(dl_dir))

~\AppData\Roaming\Python\Python37\site-packages\wfdb\io\record.py in dl_database(db_dir, dl_dir, records, annotators, keep_subdirs, overwrite) 3064 dir_name, base_rec_name = os.path.split(rec) 3065 record = rdheader( -> 3066 base_rec_name, pn_dir=posixpath.join(db_dir, dir_name) 3067 ) 3068

~\AppData\Roaming\Python\Python37\site-packages\wfdb\io\record.py in rdheader(record_name, pn_dir, rd_segments) 1845 header_content = f.read() 1846 else: -> 1847 header_content = download._stream_header(file_name, pn_dir) 1848 1849 # Separate comment and non-comment lines

~\AppData\Roaming\Python\Python37\site-packages\wfdb\io\download.py in _stream_header(file_name, pn_dir) 107 # Get the content of the remote file 108 with _url.openurl(url, "rb") as f: --> 109 content = f.read() 110 111 return content.decode("iso-8859-1")

~\AppData\Roaming\Python\Python37\site-packages\wfdb\io_url.py in read(self, size) 579 raise ValueError("invalid size: %r" % (size,)) 580 --> 581 result = b"".join(self._read_range(start, end)) 582 self._pos += len(result) 583 return result

~\AppData\Roaming\Python\Python37\site-packages\wfdb\io_url.py in _read_range(self, start, end) 472 buffer_store = True 473 --> 474 with RangeTransfer(self._current_url, req_start, req_end) as xfer: 475 # Update current file URL. 476 self._current_url = xfer.response_url

~\AppData\Roaming\Python\Python37\site-packages\wfdb\io_url.py in init(self, url, start, end) 166 self._content_iter = self._response.iter_content(4096) 167 try: --> 168 self._parse_headers(method, self._response) 169 except Exception: 170 self.close()

~\AppData\Roaming\Python\Python37\site-packages\wfdb\io_url.py in _parse_headers(self, method, response) 216 % (response.status_code, response.reason, response.url), 217 url=response.url, --> 218 status_code=response.status_code, 219 ) 220

NetFileNotFoundError: 404 Error: Not Found for url: https://physionet.org/files/mimic3wdb-matched/1.0/p00/p000020/.hea '''

briangow commented 1 year ago

@Favourj-bit , can you run this from Jupyter Notebook:

import wfdb

signals, fields = wfdb.rdsamp('3000003', pn_dir = 'mimic3wdb/1.0/30/3000003')
wfdb.plot_items(signal=signals, fs=fields['fs'], title='30/3000003/3000003')
display((signals, fields))

please post your output with any error messages.

Favourj-bit commented 1 year ago

@briangow , this code works perfectly. image

Favourj-bit commented 1 year ago

I also noticed that when i clicked on the link in the error message: image It actually directs me to a page that shows 404 error which is shown below: image

Favourj-bit commented 1 year ago

Hi @briangow so I was going through the function you suggested. I noticed something: image it seems it appends .hea to all the records directly, i'm not sure

the records in this database i'm trying to access are not just p00/p000020/.hea, so i'm guessing maybe that's where the issue is coming from. This is the directory and the files present there: image

briangow commented 1 year ago

@Favourj-bit , yes, the path you are seeing ending in /.hea isn't pointing to an actual file which is causing your problem.

Given what you need to do I'd suggest the following:

  1. Use wfdb.io.get_record_list to get a list of all of the records / files you're interested in processing
  2. Loop through the output from that and pass the relevant information to wfdb.io.rdsamp to read the record into your local memory
  3. Pre-process the data as needed
  4. Save the result to your local computer if needed by using wfdb.io.wrsamp

Details about these functions are available at https://wfdb.readthedocs.io/en/latest/index.html . Hopefully the wfdb.io.get_record_list will produce valid paths to the files for you. If not, you'll have to outsmart the code to create a valid path (ex: from https://physionet.org/files/mimic3wdb/1.0/30/3000003/.hea , create https://physionet.org/files/mimic3wdb/1.0/30/3000003/3000003.hea, etc.)

The issue you are having with wfdb.io.dl_database appears to be a bug. I've marked this issue as such. We can leave this issue open until someone with a Windows machine can debug the problem.

Favourj-bit commented 1 year ago

@briangow Thank you very much for your help. I will ensure to try out the functions you recommended and provide feedbck. That is great too, maybe we could rename the issue in case someone with a windows system come across this. Hopefully, we will be able to figure it out

Favourj-bit commented 1 year ago

@briangow I realised the code works while using google colab. It takes a lot of time to run so it is still running. However, it got to this point: Generating list of all files and it is still running for the files. I was just wondering why i did not use colab before now since i could always just download the results from there. I will let you know the outcome when i'm done. Thanks once again

bemoody commented 9 months ago

Should be fixed by pull #465.