Loki-Afro / metalarchives

Unofficial Encyclopedia Metallum API
Other
44 stars 12 forks source link

Fix: Fetching discs for bands containing an '&' character threw Exception #11

Closed Aarkon closed 2 years ago

Aarkon commented 2 years ago

A StringIndexOutOfBoundsException got thrown if parseSplitBands() was called while the band that had & in its name: bandLink contains the escaped character because it is the URL, while the bandName does not. Therefore, the bandLink.indexOf("\">" + bandName) will return -1, which in turn will not work for creating a substring one line later.

I wrote a test confirming the assumption. Comment out my fix, run the test, and you'll see the Exception.

Loki-Afro commented 2 years ago

Hey there and thanks for your contribution. Weird, does that only happen with &? wouldn't that not also affect characters like < > and such things?

Aarkon commented 2 years ago

I guess a bunch of special characters would break that logic, true, but none of them should be as common as the ampersand in my opinion. It denotes collaborations, and when I ran an exploratory project using your library, that was the only place I had to debug and manually replace stuff. My code is over here: https://gitlab.com/jakob.ledig/steel It checked 258 bands, only one proving problematic.

But you‘re of course right that a more complete escaping approach would be superior - I wasn’t able and willing to think of one though from the top of my head though. 😉 Maybe there is something off the shelf? I seriously don’t know. For now, the proposed fix at least makes it possible to retrieve albums for collaboration projects.

Loki-Afro commented 2 years ago

Yes, gonna merge the pr as soon as this april fool nonsense is over this year, as you can see the tests are failing because all bands have the genre: Metal :P

Aarkon commented 2 years ago

Lol, I was somewhat concerned to be honest, but didn't dare to look too deeply into the failing tests because I didn't want to go down the next rabbit hole. Last year it was the cats avatars, right? 😻

Loki-Afro commented 2 years ago

think that was two years ago, but these cats one is still available under some subdomain as far as i remember ..

i looked now much deeper into this issue and can't find a clear pattern. it is definitely the disc thing that fails under this condition

then it triggers the issue. however i checked for similar characters and this seems to be not always the case. This one might be an easier example, less noise at least https://www.metal-archives.com/albums/Death_%26_Hexes/Spiritual_Hex/710157

aaaaanyway, all this confision aside i think its a combination of that & character and how i hacked together that extracting of the id combined with these jsoup toString() nonsense i made here

have a more reliable solution now. - will be fixed tomorrow :)

btw. you can also call band.getDiscsPartial - if the information you get there is enough, you will issue much less requests ;)

Aarkon commented 2 years ago

In that case, just close this issue, no hard feelings from my side. 😉

I'll also have a look at the partial method, because the data I need is indeed only a subset of the full disc info, and each run in my project takes several minutes.

Thank you for the suggestion and for doing all the hard work that I can use to hack together some fun "data science"! 😅

Loki-Afro commented 2 years ago

@Aarkon new release is out there 1.0.1 .. may take up to 24 hours to be on maven central.

see this commit https://github.com/Loki-Afro/metalarchives/commit/1c8ba5373046dc05c423deace84674064fd40227 - where this is fixed

thanks for your contribution again.

If you like you can write me a mail regarding the data science you are doing, always interested ;)