Q: Why is there duplicate validation in bdp?

Rblp / Rblpapi

R package interfacing the Bloomberg API from https://www.bloomberglabs.com/api/

Other

166 stars 75 forks source link

Q: Why is there duplicate validation in bdp? #305

Closed Ljupch0 closed 4 years ago

Ljupch0 commented 4 years ago

Hi Dirk and John, thanks for the great package.

Can you shed some light as to why bdp doesn't accept duplicates?

My use case: take an entire column of bloomberg identifiers out of dataset, plug in to bdp, and merge the resulting dataframe by bloomberg id. There will definitely be duplicates in the initial column.

Many thanks.

johnlaing commented 4 years ago

It makes sense to prohibit duplicates for (at least) 2 reasons:

It would be unnecessary to ask Bloomberg for the same information multiple times, so we pass each identifier into the API only once.
The returned object contains identifiers as its rownames, which again only appear once.

You can easily call unique on your input to make it work. This is preferable to having the package do the same behind the scenes, which would break the alignment between input and output.

Ljupch0 commented 4 years ago

Hi John, thank you for the quick response. I completely agree.

Ljupch0 commented 3 years ago

I am back with this issue. In a perfect world the bank would not have the same security multiple times on the csv output. In such a scenario the unique error throws a wrench in my workfolow. Would it make sense to have the "Duplicated Securities: Securities {A}, {B}, {C} are duplicated." as a warning, yet in the background call each of these only once and duplicate the outputs? I have done this for my own functions so I'm happy to work on a PR.

johnlaing commented 3 years ago

I'm not sure what has changed from the scenario you initially reported. As I see it, we can't duplicate output because we use rownames to link output to input - and rownames can't be duplicated. Do you have a way around this?

Ljupch0 commented 3 years ago

Aha I see the issue. I use Rblpapi in a dplyr centric workflow and convert things to tibbles where there are no rownames. I think the best solution is a separate package that wraps Rblpapi so it plays well with the tidyverse. What do you think?