analysiscenter / cardio

CardIO is a library for data science research of heart signals
https://analysiscenter.github.io/cardio/
Apache License 2.0
247 stars 78 forks source link

TypeError: '<' not supported between instances of 'float' and 'str' #9

Closed quzhouxiachuan closed 6 years ago

quzhouxiachuan commented 6 years ago

Hi I ran into trouble when trying the II.pipelines script. I used the tests/data dataset as my input. Could you please help with that?

import os from cardio import EcgDataset PATH_TO_DATA = "../cardio/tests/data" #set path to data pds = EcgDataset(path=os.path.join(PATH_TO_DATA, "*.hea"), no_ext=True, sort=True) pds.cv_split(0.8, shuffle=True)

from cardio.pipelines import dirichlet_train_pipeline %env CUDA_VISIBLE_DEVICES=0 AF_SIGNALS_REF = os.path.join(PATH_TO_DATA, "REFERENCE.csv") pipeline = dirichlet_train_pipeline(AF_SIGNALS_REF, batch_size=256, n_epochs=500)

trained = (pds.train >> pipeline).run()


TypeError Traceback (most recent call last)

in () ----> 1 trained = (pds.train >> pipeline).run() ~/anaconda3/lib/python3.6/site-packages/cardio/dataset/dataset/pipeline.py in run(self, *args, **kwargs) 1086 if len(args) == 0 and len(kwargs) == 0: 1087 args, kwargs = self._lazy_run -> 1088 for _ in self.gen_batch(*args, **kwargs): 1089 pass 1090 return self ~/anaconda3/lib/python3.6/site-packages/cardio/dataset/dataset/pipeline.py in gen_batch(self, batch_size, shuffle, n_epochs, drop_last, prefetch, on_iter, *args, **kwargs) 1033 for batch in batch_generator: 1034 try: -> 1035 batch_res = self._exec(batch) 1036 except SkipBatchException: 1037 pass ~/anaconda3/lib/python3.6/site-packages/cardio/dataset/dataset/pipeline.py in _exec(self, batch, new_loop) 577 asyncio.set_event_loop(asyncio.new_event_loop()) 578 batch.pipeline = self --> 579 batch_res = self._exec_all_actions(batch) 580 batch_res.pipeline = self 581 return batch_res ~/anaconda3/lib/python3.6/site-packages/cardio/dataset/dataset/pipeline.py in _exec_all_actions(self, batch, action_list) 563 join_batches = None 564 --> 565 batch = self._exec_one_action(batch, _action, _action_args, _action['kwargs']) 566 567 batch.pipeline = self ~/anaconda3/lib/python3.6/site-packages/cardio/dataset/dataset/pipeline.py in _exec_one_action(self, batch, action, args, kwargs) 514 batch.pipeline = self 515 action_method, _ = self._get_action_method(batch, action['name']) --> 516 batch = action_method(*args, **kwargs) 517 batch.pipeline = self 518 return batch ~/anaconda3/lib/python3.6/site-packages/cardio/dataset/dataset/decorators.py in _action_wrapper(action_self, *args, **kwargs) 35 action_self.pipeline.get_variable(_lock_name).acquire() 36 ---> 37 _res = action_method(action_self, *args, **kwargs) 38 39 if _use_lock is not None: ~/anaconda3/lib/python3.6/site-packages/cardio/core/ecg_batch.py in load(self, src, fmt, components, ann_ext, *args, **kwargs) 252 components = np.asarray(components).ravel() 253 if (fmt == "csv" or fmt is None and isinstance(src, pd.Series)) and np.all(components == "target"): --> 254 return self._load_labels(src) 255 elif fmt in ["wfdb", "dicom", "edf", "wav"]: 256 return self._load_data(src=src, fmt=fmt, components=components, ann_ext=ann_ext, *args, **kwargs) ~/anaconda3/lib/python3.6/site-packages/cardio/core/ecg_batch.py in _load_labels(self, src) 353 raise RuntimeError("Batch with undefined unique_labels must be created in a pipeline") 354 ds_indices = self.pipeline.dataset.indices --> 355 self.unique_labels = np.sort(src[ds_indices].unique()) 356 return self 357 ~/anaconda3/lib/python3.6/site-packages/numpy/core/fromnumeric.py in sort(a, axis, kind, order) 845 else: 846 a = asanyarray(a).copy(order="K") --> 847 a.sort(axis=axis, kind=kind, order=order) 848 return a 849 TypeError: '<' not supported between instances of 'float' and 'str'
quzhouxiachuan commented 6 years ago

Hi I solved the problem by change pds = EcgDataset(path=os.path.join(PATH_TO_DATA, "*.hea"), no_ext=True, sort=True) to import cardio.dataset as ds index = ds.FilesIndex(path="../cardio/tests/data/*.hea", no_ext=True, sort=True) from cardio import EcgBatch pds = ds.Dataset(index, batch_class=EcgBatch)

I am wondering what is causing this?

Thanks!

dpodvyaznikov commented 6 years ago

Hi!

Currently cardio/tests/data/ directory contains different ECG's to run tests. There are two types of signals in wfdb format: Axxxxx files from PhysioNet Challenge 2017 Database and one signal sel100 from QT Database. For AF detection you should use data from PhysioNet Challenge 2017. As it is mentioned in the tutorial, we recommend you to download the database from here or use signals from cardio/test/data/ with mask A*.hea.

Changing

pds = EcgDataset(path=os.path.join(PATH_TO_DATA, "*.hea"), no_ext=True, sort=True)

to

import cardio.dataset as ds
index = ds.FilesIndex(path="../cardio/tests/data/*.hea", no_ext=True, sort=True)
from cardio import EcgBatch
pds = ds.Dataset(index, batch_class=EcgBatch)

should not make any difference. It seems that the code above worked by accident: you performed splitting with shuffle pds.cv_split(0.8, shuffle=True) and index sel100, which caused the problem, happened to be in pds.test, thus allowing you to run training without any errors.