MohamedRejeb / Ksoup

Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML, extracting HTML tags, attributes, and text, and encoding and decoding HTML entities.
Apache License 2.0
363 stars 10 forks source link

Jsoup like Element, Document classes #26

Open Vaibhav2002 opened 11 months ago

Vaibhav2002 commented 11 months ago

Does Ksoup have features like Document, Element, etc classes, like Jsoup has?

Queatz commented 4 months ago

Also wondering. Was thinking to use this library to extract og:* metadata

vanniktech commented 3 months ago

All of this is supported by https://github.com/fleeksoft/ksoup - @MohamedRejeb are there any plans to work together on Ksoup support for Kotlin Multiplatform? To be honest, I don't care which library I am using but the one from @fleeksoft seems superior as it really has everything from Jsoup, including thins that I've asked:

https://github.com/MohamedRejeb/Ksoup/issues/17 https://github.com/MohamedRejeb/Ksoup/issues/18

In addition it has the benefit that the API names are the same, so you can just google whatever with jsoup and adjust the syntax to Kotlin. The only drawback is that your library seems to be far more active though.


I've switched to the other library for now since I also need Element, Document etc for full parsing of HTML.

I believe that other ksoup library also has support for:

https://github.com/MohamedRejeb/Ksoup/issues/13 https://github.com/MohamedRejeb/Ksoup/issues/5 https://github.com/MohamedRejeb/Ksoup/issues/4

It would be a shame to do the work twice. Also it's rather confusing that there are two libraries which are named exactly the same and seem to be doing the same from the outside.

MohamedRejeb commented 3 months ago

Hi, You are right @vanniktech . The problem is that the other library is backed by a company. I will try to reach them and see if we can do a collaboration.

westnordost commented 3 months ago

I found this ticket because I was confused why there are two KSoups out there and no explanation what is the difference between the two.

Looking at @fleeksoft's build.gradle.kts, it pulls in a number of dependencies I wouldn't expect from a simple HTML parser, such as network access, date-time parsing, file access and support for unicode code points. Given that JSoup actually features parsing a HTML directly from a web page, maybe not surprising for a faithful port. It's JAR size for JVM is additionally over 600kB.

@MohamedRejeb's Ksoup has no external dependencies and it's JAR size for JVM is just over 60kB. Great! That's what I need for my project - a simple HTML DOM parser. But that is maybe not what people looking for a port of Jsoup to KMP are looking for. They might be looking for a port that offers the same features.

If my assessment of your library as a simple HTML DOM parser and nothing else is correct, @MohamedRejeb , how about you renamed your library accordingly to firstly resolve confusion which is a faithful (probably - didn't look closely at fleeksoft's lib yet) port of Jsoup and secondly do expectation management: If people don't assume this library is anything else or more than a HTML parser, i.e. has all the features Jsoup has, you won't get flooded with feature requests to add this or that because Jsoup has it. Finally, since the name would be different, there is no expectation that the API would be similar. For example, the "handler" stuff is quite Java-typical. In Kotlin one would probably rather emit a Sequence of entities.

westnordost commented 3 months ago

(Or in an ideal world, there'd be one library that just does the basic HTML parsing, yours, and then fleeksoft's Ksoup port would use this as a dependency and add all that stuff they need to get feature-parity with Jsoup. But such cooperations usually don't work except if this happens within the same organization.)

vanniktech commented 3 months ago

Looking at @fleeksoft's build.gradle.kts, it pulls in a number of dependencies I wouldn't expect from a simple HTML parser, such as network access, date-time parsing, file access and support for unicode code points. Given that JSoup actually features parsing a HTML directly from a web page, maybe not surprising for a faithful port. It's JAR size for JVM is additionally over 600kB.

I was also suprised by this but it does make sense: https://github.com/fleeksoft/ksoup/issues/30 - Java has all the APIs built in. Kotlin Multiplatform does not. I think it would make sense to maybe provide extension modules for file support if one wants it. In my case I do use all those transitive libraries anyways so it does not matter for me.

@MohamedRejeb's Ksoup has no external dependencies and it's JAR size for JVM is just over 60kB. Great! That's what I need for my project - a simple HTML DOM parser. But that is maybe not what people looking for a port of Jsoup to KMP are looking for. They might be looking for a port that offers the same features.

In the beginning I also only needed a simple HTML DOM parser but then I had to use the full features of Jsoup. Also if you're on Android, R8 will remove everything that's not needed.

If my assessment of your library as a simple HTML DOM parser and nothing else is correct, @MohamedRejeb , how about you renamed your library accordingly to firstly resolve confusion which is a faithful (probably - didn't look closely at fleeksoft's lib yet) port of Jsoup and secondly do expectation management: If people don't assume this library is anything else or more than a HTML parser, i.e. has all the features Jsoup has, you won't get flooded with feature requests to add this or that because Jsoup has it.

I think this library was here first and only later that other ksoup library was created. But renaming sounds good to avoid confusion if there is no collaboration wanted.