dputhier / pygtftk

A python package and a set of shell commands to handle GTF files
GNU General Public License v3.0
45 stars 6 forks source link

Replace minibatch-nb and minibatch-size with max-ram #82

Closed guillaumecharbonnier closed 5 years ago

guillaumecharbonnier commented 5 years ago

I feel we could replace these two technical arguments with a more user-friendly "--max-ram" argument with some benchmarking to find the appropriate extrapolation function depending on input files. Even better (?), the algorithm could :

  1. memory-ologram <- check memory used by ologram after loading of all files.
  2. send one batch.
  3. memory-batch <- check the batch memory consumption.
  4. send n other batch in order to have memory-ologram + n * memory-batch ~< max-ram
qferre commented 5 years ago

The problem is that the RAM footprint depends on the features. Even with batches of the same size, the memory cost will be different when processing 'exons' than when processing 'start_codon'.

Making a dynamic batch size would be possible, as the function to produce each batch runs independantly, but require some refactoring.