analysiscenter / cardio

CardIO is a library for data science research of heart signals
https://analysiscenter.github.io/cardio/
Apache License 2.0
248 stars 78 forks source link

reading labeled data #35

Closed AnoopRKulkarni closed 4 years ago

AnoopRKulkarni commented 4 years ago

Hello all,

The flow of "dataset => batch => batch with data" is understood. "next_batch" can be called with an arbitrary batch_size.

My question is while loading the "target" component in "batch with data" how I can extract the same indices from the larger labeled data file?

e.g. consider the following eds = EcgDataset(path="*.hea") batch = eds.next_batch (batch_size = 8) batch_with_data = batch.load (fmt="wfdb", components = ["signal", "meta"]

Now, suppose I have totally 1000 .hea files and a csv file with 1000 outcome labels. How do I fetch exactly those 8 labels in "batch_with_data" that correspond to same indices as files in the batch from this larger 1000 labels array?

thanks

~anoop

roman-kh commented 4 years ago

There are many ways to do that.

  1. Load batch_with_data.load(fmt='csv', src='/path/to/labels.csv', components='labels', index_col=0)

  2. Load labels into a dataframe first with pandas.read_csv and then batch_with_data.load(src=df_with_labels, components='labels')

  3. Put labels directly - batch_with_data.labels = df_with_labels['labels'][batch_with_data.indices]

For this to work ensure that CSV is indexed with eds.indices, i.e. the first column of the table contains the same elements as dataset indices.

AnoopRKulkarni commented 4 years ago

thank you! got it working with (2) type syntax.

best regards, ~anoop