Open foderking opened 3 years ago
That sounds like a cool feature! Could you give me a few wikipedia page examples please?
https://en.wikipedia.org/wiki/Scorpion_(Drake_album) https://en.wikipedia.org/wiki/The_Best_in_the_World_Pack https://en.wikipedia.org/wiki/Positions_(album)
Generally any page for an album. Parsing the wikitext source ignores the "tracklist section", thats why i have to use regex first to get only section and then parse that.
So, this is an interesting and difficult problem. First of all, the track listings are not ever in a infobox
. This parser has stretched itself to parse other things (albeit, not very well) outside of infoboxes, but I do not think it was wise to do that.
That being said, I may try and refactor out my data-types to common components which can be used to parse infoboxes, page sections, or even entire page sources.
It's a complex problem, like many that come up in wiki-text parsing.
By the way, how was the parsed version of the album when you did it manually? If it was nice, I may just hack that together for now.
i did a regex match for the tracklist section /{{track.*list.*?^}}/gmsi
This also captures when there are like 2 tracklist sections
I then parse the sections independently with the infobox. it works pretty well, although producer info is kept in the "extra credits" in the parsed object
heres the link to the repository https://github.com/foderking/WhoProduced/blob/main/src/App.js
i'm writing an app for get album information. right now i'm using an hackby first using regex to get the "tracklist" section , then parsing that.
it would be cool to be able to parse tracklist easily - espcially for double albums where you have 2 or more "{{tracklist...}}" sections