calico / basenji

Sequential regulatory activity predictions with deep convolutional neural networks.
Apache License 2.0
410 stars 126 forks source link

Memory requirements for tfr_hdf5.py on basenji_human tfrecords #142

Closed rnsherpa closed 2 years ago

rnsherpa commented 2 years ago

I'm trying to convert all the human tfrecords to hdf5 format. All necessary files downloaded directly from basenji_barnyard. I have attempted to run tfr_hdf5.py allocating 300GB of memory but I get a memory error. What are the recommended system requirements for running this script?

rnsherpa commented 2 years ago

Ended up allocating 1000GB of memory and it worked.

davek44 commented 2 years ago

Ah yea I generally use that script to debug and convert something like a single TFRecord. I’m sure we could write it to be more memory efficient if needed.