AndInTheClouds / chordreader2

Search for, display, transpose and save chords on your phone, that you get from the interwebs. :notes:
GNU General Public License v3.0
33 stars 8 forks source link

Have dedicated HTML parsers for specific pages #27

Open wtimme opened 1 year ago

wtimme commented 1 year ago

The current solution with the generic HTML parser works great for sites such as "Ultimate G...". For some sites (such as https://www.chords-and-tabs.net, for example), it does not work so great. I have to clean up the parsed text file quite a bit.

Therefore, I propose the following solution: The app should contain multiple HTML parsers, each for a dedicated site. As a fallback, the current "generic" parser could be used. The page-specific parsers could be unit-tested, ensuring that they work as expected. In addition, this allows for people to request/contribute parsers for the sites that they use without impacting the "generic" parser that we have right now.

The interface for the parsers could look as follows:

interface WebPageChordParser {
    // Provides feedback if the parser supports the given URL.
    fun supportsURL(url: String): Boolean

    // Attempts to converts the given HTML text to a plaintext document
    // which contains just the chords.
    fun convertHtmlToText(htmlText: String): String?

    // Attempts to determine the BPM of the song from the given HTML.
    fun extractBPMFromHtml(htmlText: String): Int?
}

Each parser (the generic one, too) would implement this interface. The WebSearchViewModel could then be provided with a list of these WebPageChordParser object and iterate over each of them, asking it if it supports the given URL. If no parser supports the given URL, the generic parser would take over, as a fallback.