aliles / filemagic

File type identification using libmagic
http://pypi.python.org/pypi/filemagic
Other
47 stars 10 forks source link

Repeatedly Instantiating magic.Magic Leaks Memory #1

Closed mitchwalker1979 closed 11 years ago

mitchwalker1979 commented 12 years ago

... even if the local variable is deleted. Easy to reproduce:

while True: m = magic.Magic(flags=magic.MAGIC_MIME_TYPE)

This will quickly die with a MemoryError. This is problematic when it's used in a self-contained function like:

def mimetype(data): m = magic.Magic(flags=magic.MAGIC_MIME_TYPE) return m.id_buffer(data)

Calling such a function repeatedly in a large program begins to consume prohibitive amounts of memory. An obvious work-around is to instantiate Magic once and keep a reference around for future calls to id_buffer(), but I'm bringing this up on the chance that it's endemic of a bigger problem.

aliles commented 12 years ago

Are you calling close() on the magic object or using with statement? The magic.Magic object does not currently automatically clean up resources unless explicitly closed. Your self-contained function from above should look like:

def mimetype(data): with magic.Magic(flags=magic.MAGIC_MIME_TYPE) as m: text = m.id_buffer(data) return text

or alternatively with out using a context manager.

def mimetype(data): m = magic.Magic(flags=magic.MAGIC_MIME_TYPE) text = m.id_buffer(data) m.close() return text

This isn't highlighted very well in the usage documentation. It's there, just not emphasised.

I'll look into releasing a new version shortly that automatically cleans up resources when the object is garbage collected. Although relying on this could still cause weird memory behaviour on PyPy due to its non-deterministic garbage collection.

mitchwalker1979 commented 12 years ago

Classic case of RTFM. I don't remember which docs I read, but it wasn't these because that explanation is perfectly clear. Thanks for your prompt response, and sorry to impugn the quality of your work.

aliles commented 11 years ago

I've just pushed a new release 1.4 of filemagic for PyPI. The changes is this version :

  1. A section on memory management added to the documentation on Read The Docs
  2. Automatic cleanup of libmagic resource when a magic.Magic instance is garbage collected.

As noted in the new documentation, because not all interpreters have deterministic garbage collection filemagic will issue a warning if it automatically cleans up resources.

Thank you for opening this issue, it has highlighted issues with both documentation and robustness. I apologise for the delay in getting a new release out.