aantron / lambdasoup

Functional HTML scraping and rewriting with CSS in OCaml
https://aantron.github.io/lambdasoup
MIT License
380 stars 31 forks source link

simple examples #49

Open UnixJunkie opened 2 years ago

UnixJunkie commented 2 years ago

Hello,

I don't know CSS, but it looks like your library can do what I need.

I would like some simple data extraction example.

If I have such an HTML text:

<td style="background-color:#CCFFCC;" class="_siheader">
TOTO</td>
<td  class="_sibody">
12.34</td>

I would like to extract the TOTO float value (12.34). How should I go about it?

Thanks a lot, F.

UnixJunkie commented 2 years ago

I have to admit that the "open Soup" directive in the README file and documentation completely leaves me blind about which functions are from the Soup module. The use of operators like '$', '$$', and '|>' also makes the code very unreadable to me. I would like to know, in plain ocaml, how to use this library... I almost feel like all the examples and documentation are about some Haskell code.

aantron commented 2 years ago

How should I go about it?

To fully avoid all the operators besides |>, one way is

Soup.parse html_text
|> Soup.select_one "td._sibody"
|> Option.bind Soup.leaf_text
|> Option.map Float.of_string
|> Option.get

I have to admit that the "open Soup" directive in the README file and documentation completely leaves me blind about which functions are from the Soup module.

All the ones that aren't from the standard library are from Soup. The library documentation assumes the reader recognizes identifiers from the OCaml standard library, and looks in the library documentation for the rest. In my own programming I indeed don't do open Soup, but I do often do open Soup.Infix to get just $, $$, $?, or else write out their replacements (Soup.select family).

The use of operators like '$', '$$', and '|>' also makes the code very unreadable to me.

|> in particular is a standard operator that is very heavily used in OCaml programming. At least this operator is plain OCaml.

The $ and $$ operators are very simply defined in the library documentation.

I would like to know, in plain ocaml, how to use this library...

See the standard library and Lambda Soup documentation (all written in plain OCaml).

I almost feel like all the examples and documentation are about some Haskell code.

Could you adjust this to feel like it's jQuery instead? This is just ordinary "pipelining" or "chaining" as used in JS and throughout the programming world.

UnixJunkie commented 2 years ago

Thanks for the example. I'll try to adapt some code I have to use it.