blp.bdh exit before fetching all the data

spyamine commented 4 years ago

Hi,

Thank you for your package, Excellent work!!

I use your blp.bdh to fetch the data from bloomberg. I use it in the middle of a loop because i want to get data for more that the bloomberg limit per ticker so i split the data into chunks and run bdh in a loop.

It does end the loop before the program finish getting all the data.

can you please help?

alpha-xone commented 4 years ago

thanks for using the package.

not sure if this is within the scope of this package but can you share your code pls? the bbg response data is hard to manage on API level. one way to overcome this kind of issue is 1) to avoid requesting too much data at a time, and 2) save data to local temporarily and combine them before doing analysis.

spyamine commented 4 years ago

Hi, thanks you for your response.

I've managed to solve the problem in the meantime. I've used joblib instead of a plain loop. Here is my initial:

def get_bloomberg_EOD(symbols=None,adjusted=False,assets=None,start=datetime.datetime.strptime("1970-01-01", "%Y-%m-%d").date(),end=datetime.date.today() + datetime.timedelta(-1)):

startString = start.strftime("%Y-%m-%d")
endString = end.strftime("%Y-%m-%d")

print ("getting the EOD data from {} to {} ".format(startString,endString))

classes_symbols = [(s.split(" ")[-1].capitalize(),s) for s in symbols]

dict_1 = {}
for classes, symbols in classes_symbols:
    dict_1.setdefault(classes, []).append(symbols)

# print (dict_1)

classes = list(dict_1.keys())
# print(classes)

EOD_list = []
for clasz in classes:
    print ("==== " * 4)
    print ("working on '{}' assets, {} symbols".format(clasz,len(dict_1[clasz]))  )
    # getting and storing the data to the EOD library
    fields = EOD_FIELDS[clasz]
    symbols_class = dict_1[clasz]
    print (symbols_class)

    # pool = Pool()

    for tickers in chunks(symbols_class, BLOOMBERG_TICKERS_LIMITATION - 1):
        if adjusted == False:
            data = blp.bdh(tickers, flds=fields,start_date=startString,end_date=endString,log='debug')

            EOD_list.append(data)
        else:
            data = blp.bdh(tickers, flds=fields,start_date=startString,end_date=endString,adjust='-')
            EOD_list.append(data)

EOD = pd.concat(EOD_list)
print (EOD.empty)
EODs = dataFrameSpliter(EOD)
print ("len(EODs): {}".format(len(EODs)))
return EODs

The new code is the following:

def get_bloomberg_EOD_paralleled(symbols=None,adjusted=False,assets=None,start=datetime.datetime.strptime("1970-01-01", "%Y-%m-%d").date(),end=datetime.date.today() + datetime.timedelta(-1)):

startString = start.strftime("%Y-%m-%d")
endString = end.strftime("%Y-%m-%d")

print ("getting the EOD data from {} to {} ".format(startString,endString))

classes_symbols = [(s.split(" ")[-1].capitalize(),s) for s in symbols]

dict_1 = {}
for classes, symbols in classes_symbols:
    dict_1.setdefault(classes, []).append(symbols)

classes = list(dict_1.keys())

EOD_list = []
for clasz in classes:
    print ("==== " * 4)
    print ("working on '{}' assets, {} symbols".format(clasz,len(dict_1[clasz]))  )
    # getting and storing the data to the EOD library
    fields = EOD_FIELDS[clasz]
    symbols_class = dict_1[clasz]
    print (symbols_class)

    results = Parallel(n_jobs=2)(delayed(get_historical)(tickers=tickers,startString=startString,endString= endString,fields= fields ,adjusted = False) for tickers in chunks(symbols_class, BLOOMBERG_TICKERS_LIMITATION - 1)  ) 

    EOD_list = EOD_list + results

print (EOD_list)

print (len(EOD_list))

def get_historical(tickers,startString,endString ,fields,adjusted ):

if adjusted == False:
    data = blp.bdh(tickers, flds=fields,start_date=startString,end_date=endString,log='debug')

    return data
else:

    data = blp.bdh(tickers, flds=fields,start_date=startString,end_date=endString,adjust='-')
    return data

alpha-xone commented 4 years ago

Right, but sending requests to bbg simultaneously is less preferred because we’re getting response data thru one portal. Not sure if the dataframe you get will be stable.

spyamine commented 4 years ago

so what do you suggest as a solution. I have around 1500 tickers to query EOD data and other things ?

spyamine commented 4 years ago

Actually the joblib did not solve the problem.

Actually we don't need it!! in xbbg\core\process.py the function rec_events has a timeout condition set at 10 * 500 ms. and depending on the size of the data to get it exits before continuing fectching all the data.

We have two solutions: either to remove the timeout part or to set it at a bigger time span.

I don't know if i was clear. I'm not a programmer, I'm a fund manager so doing my best to be clear on how to share the ideas in a proper way

spyamine commented 4 years ago

https://github.com/pyxll/blpapi-python/blob/master/examples/SimpleHistoryExample.py

this is the link to to the vanilla history request. there is no timeout.

spyamine commented 4 years ago

I was dreaming about the problem!! I did not want to use my old program. yours is far more intuitive!!

Cheers for the good work !!

spyamine commented 4 years ago

I just realized that i had the timeout problem because I'm sitting in a room far from the router!! lol

alpha-xone commented 4 years ago

if you send multiple requests simultaneously, when bbg returns data, it may cause problems - even after you increase the timeout. please avoid using threads or parallel looping of any sort when possible.

alpha-xone commented 4 years ago

for 1500 names, you can query 100 names at a time and save them locally. then operate on local files thereafter. bbg did lots of optimizations from their end. if you make 15 queries one by one, it would probably be faster then using joblib

spyamine commented 4 years ago

Thank you for your advice!! this is what i did!!

alpha-xone / xbbg

blp.bdh exit before fetching all the data #15