BingAds / BingAds-Python-SDK

Other
117 stars 162 forks source link

tmp/suds getting full of *.px files #117

Closed datacubed closed 3 years ago

datacubed commented 5 years ago

I'm not sure if this an issue or not, so I apology now if I'm in the wrong place. Our server hit us with a "ran out of disk space", and when looking around found the tmp/suds folder being quite a few GBs inside. Once the *.px files were deleted, everything returned to normal.

I know usually /tmp/ is cleared on reboot - but we try not to reboot our servers too often. Is it possible for me to avoid this via a function in the API, or do I just need to write a command that clears the tmp/suds director after we've used the Bing API.

Cheers

khvn26 commented 5 years ago

Hi! Check out our fork where this issue is solved: https://github.com/BingAds/BingAds-Python-SDK/pull/108

qitia commented 5 years ago

I hope this can help you.

datacubed commented 5 years ago

Hi! Check out our fork where this issue is solved: #108

Hey this post says "the branch has conflicts" - should I assume to wait for this to be resolved before using?

I hope this can help you.

@qitia from this post do you recommend adding the fix as mentioned: https://github.com/BingAds/BingAds-Python-SDK/issues/80#issuecomment-390359504

In the post the last comment mentions moving away from suds ? Is that something happening with this library? Or will there be an update that handles this cache "bloating" (as ive seen it called) ?

qitia commented 5 years ago

hey @datacubed , what I recommend is try below method, with remove_suds_cache set to true. This will help you clear the cache folder on app's exist.

from os import path
from tempfile import gettempdir
import atexit

def remove_suds_cache(remove_suds_cache=False):
    if remove_suds_cache:
        shutil.rmtree(path.join(gettempdir(), 'suds'), ignore_errors=True)
    pass

# Set remove_suds_cache=True to remove the temp suds cache
atexit.register(remove_suds_cache, remove_suds_cache=False)

I hope this can help.

khvn26 commented 5 years ago

There's an internal suds.Client instance in the library code which is used to construct SOAP object factories. It's instantiated with suds' default cache which pollutes the /tmp/suds directory because a bug in suds' default cache implementation.

Our fork makes this instance use a dummy in-memory cache (basically a dict), which will load Bing Ads' SOAP XSDs to RAM but won't touch the disk at all. It overwrites the instantiating code, hence the conflicts. You should be fine using it.

d601 commented 5 years ago

I recently ran into this issue so I took a look at the solutions people have posted here. I found them problematic because of various issues, including 1.) deleting the /tmp/suds directory at exit, which I imagine would cause problems if you have lots of programs running simultaneously, and 2.) waiting for #108 to get merged, which might not happen soon.

The core issue is that suds-jurko 0.6 defines a mangle() function in its reader class that is broken in newer versions of python. The old one does this: h = abs(hash(name)) I believe hash() was changed at some point to return different values for the same name in different processes. The development version of suds-jurko has this instead: h = md5(name.encode()).hexdigest() which behaves deterministically for the same name.

I wrote some code to monkeypatch suds 0.6 into behaving correctly:

import suds.reader

try:
    from hashlib import md5
except ImportError:
    # 'hashlib' package added in Python 2.5 so use the now deprecated/removed
    # 'md5' package in older Python versions.
    from md5 import md5

def mangle(self, name, x):
    """
    Mangle the name by hashing the I{name} and appending I{x}.

    @return: The mangled name.
    @rtype: str

    """
    h = md5(name.encode()).hexdigest()
    return '%s-%s' % (h, x)

print("Monkeypatching suds Reader classes")
suds.reader.Reader.mangle = mangle
suds.reader.DocumentReader.mangle = mangle
suds.reader.DefinitionsReader.mangle = mangle

print("Monkeypatching complete, importing bingads")
from bingads import *

(It's entirely possible the above code could be simplified - I just know that what I've written works.) If you have some library code you can drop this in as bingads.py and then do from \.bingads import \. This will make sure suds gets patched before bingads starts doing anything. I tested this in python 3.6 and it seems to work.

IMO a better solution than #108 or trying to fix the broken reader code would be to just use a version of suds that's actually maintained, like suds-community. Or switch to zeep. But others have already suggested both these options in other issues.

qitia commented 3 years ago

moved to suds-community since 13.0.11.1

addisonjones-gc commented 2 years ago

Hi all, just chiming in to say I'm still getting this issue just in different form. I'm using bingads in an Airflow DAG and if imported anywhere in a file with the @dag or @task decorator, it creates a new directory and .px file in /tmp/. This is causing our Pods to be evicted with the ever growing files in the /tmp folder which is a real drag.

Example:

/tmp
  /tmpatzruj5fsuds-default-cache
    suds-9dbf36d0033995011d074a4d7e7f1669-document.px
    version

No idea what's causing this, but it's clearly the suds.cache.py __init__() method which is seeing None for the location and setting the default location.

Using bingads 13.0.13, suds-community 1.1.1 .

Any amount of help would be excellent.

addisonjones-gc commented 1 year ago

As a follow up to the above, updating to the 13.0.14 version of the SDK fixed the default-cache folder creep issue. Our scheduler is still full of tmp* directories, but at least they're empty now.