analysiscenter / cardio

CardIO is a library for data science research of heart signals
https://analysiscenter.github.io/cardio/
Apache License 2.0
247 stars 78 forks source link

Run very long on Training pipeline #12

Closed truongnmt closed 6 years ago

truongnmt commented 6 years ago

I'm following this tutorial for detecting Atrial fibrillation but when run Training pipeline it take so much time.

I'm using Tesla K80 and I leave it ran all night, more than 7 hours, but now it's still running. In this block it's running 1000 epochs:

template_train_ppl = (
    ds.Pipeline()
      .init_model("dynamic", DirichletModel, name="dirichlet", config=model_config)
      .init_variable("loss_history", init_on_each_run=list)
      .load(components=["signal", "meta"], fmt="wfdb")
      .load(components="target", fmt="csv", src=LABELS_PATH)
      .drop_labels(["~"])
      .rename_labels({"N": "NO", "O": "NO"})
      .flip_signals()
      .random_resample_signals("normal", loc=300, scale=10)
      .random_split_signals(2048, {"A": 9, "NO": 3})
      .binarize_labels()
      .train_model("dirichlet", make_data=concatenate_ecg_batch,
                   fetches="loss", save_to=V("loss_history"), mode="a")
      .run(batch_size=BATCH_SIZE, shuffle=True, drop_last=True, n_epochs=N_EPOCH, lazy=True)
)

train_ppl = (eds.train >> template_train_ppl).run()

Do you thing that we have smt not right here? Or does the framework have something indicate that it's running, maybe print the number of current epoch it's running? And btw when I run I see this in terminal FYI:

2018-04-30 16:19:57.630351: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.71GiB. The caller indicates that this
 is not a failure, but may mean that there could be performance gains if more memory is available.
roman-kh commented 6 years ago

You might add a line

.call(lambda _, v: print(v[-1]), v=V('loss_history'))

before run(batch_size=BATCH_SIZE,...).

It will print a loss function value at each iteration.

truongnmt commented 6 years ago

Thanks a lot, it worked!!! And btw, it has just finished 1000 epochs 🔥 🔥 🔥

emadahmed97 commented 6 years ago

@truongnmt How long did it take you?

truongnmt commented 6 years ago

@emadahmed97 took me about 8 or 9 hours dude