Open gonssal opened 2 years ago
If it complains that CLICK
is not supported it means you are using the in-memory HTTP driver and need to switch to CDP one inside your query.
LET doc = DOCUMENT('my-page', { driver: "cdp" })
CLICK(doc, "#my-button")
RETURN TRUE
Yeah that did trick. I'm really sorry about wasting your time with these seemingly stupid issues, I'm finding it hard to work productively with ferret.
Things that are simple in virutally any programming language become exceedingly difficult in FQL.
Hey, I'm sorry to hear that you are having difficulties. What could be done to make it better?
I guess most of the issues are due to the declarative (functional?) design you chose for FQL and how it works.
For example, a real estate site I'm crawling has some data on each property with this HTML:
<ul class="props">
<li>
<div><span class="icon-wa50-sup"></span> m<sup>2</sup></div>
<div>215</div>
</li>
<li>
<div><span class="icon-wa50-bed"></span> Rooms</div>
<div>4</div>
</li>
<li>
<div><span class="icon-wa50-bath"></span> Bathrooms</div>
<div>3</div>
</li>
<li>
<div><span class="icon-wa50-parking"></span> Parking</div>
<div><i class="icon-wa50-check"></i></div>
</li>
</ul>
Not all the elements are always there on all the properties, so to know what I'm getting, I came up with this (some not-relevant code omitted):
LET property = {
URL: propertyUrl,
Title: TRIM(INNER_TEXT(propDoc, '.cardSlider > .body h1.titulo')),
Reference: SUBSTITUTE(TRIM(INNER_TEXT(propDoc, '.cardSlider > .body .ref')), 'Ref.', ''),
Description: TRIM(INNER_TEXT(propDoc, '.cardSlider > .body #descripcion_larga')),
Price: SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(INNER_TEXT(propDoc, '.cardSlider > .body .precio')), '€', ''), '.', ''), ',', '.'),
Currency: 'EUR',
AreaUnit: 'm2',
Images: images,
Type: 'buy'
}
LET propertyDataElements = ELEMENTS(propDoc, '.cardSlider > .body > .props:not(.props2) li')
LET propertyDataIndexes = (
FOR propData IN propertyDataElements
LET data = (
ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-sup') ? 'Area' : (
ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-bed') ? 'Bedrooms' : (
ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-bath') ? 'Baths' : (
ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-parking') ? 'Parking' : none
)
)
)
)
RETURN data
)
LET propertyDataValues = (
FOR propData IN propertyDataElements
LET data = (
ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-sup') ? 'Area' : (
ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-bed') ? 'Bedrooms' : (
ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-bath') ? 'Baths' : (
ELEMENT_EXISTS(propData, 'div:first-child span.icon-wa50-parking') ? 'Parking' : none
)
)
)
)
RETURN (data == 'Parking' ? (ELEMENT(propData, 'div:nth-child(2) i.icon-wa50-check') ? '1' : none) : TRIM(INNER_TEXT(propData, 'div:nth-child(2)')))
)
RETURN MERGE(property, ZIP(propertyDataIndexes, propertyDataValues))
As you can see, the lack of if/else makes me nest a lot of ternary operators. Also, to add fields to the property
object I have to build 2 different arrays to build the keys and values and then use ZIP()
. I was expecting to be able to do something like this instead:
property[data] = (data == 'Parking' ? (ELEMENT(propData, 'div:nth-child(2) i.icon-wa50-check') ? '1' : none) : TRIM(INNER_TEXT(propData, 'div:nth-child(2)')))
This is just one recent example.
Well, certain limitations were done intentionally while others were a result of the source of inspiration.
Regarding the lack of if/else
I do not see how it would simplify your logic, you would still have nested conditions.
And again, most of the time it's the way you solve particular problems and you just need to switch your thought process from imperative to declarative flow.
You can always switch to xpath
and do something like this.
Well, certain limitations were done intentionally while others were a result of the source of inspiration. Regarding the lack of
if/else
I do not see how it would simplify your logic, you would still have nested conditions.
else if
helps avoid nesting. A switch could also be used instead. In my example there's only 4 conditions, imagine the nesting if there were 15 or more.
And again, most of the time it's the way you solve particular problems and you just need to switch your thought process from imperative to declarative flow. You can always switch to
xpath
and do something like this.
The problem was not getting the data, but knowing what type of data it is and appending it to the already existing property object. Instead of doing property[data] = value
in a single FOR
, I had to create two arrays, making sure they are the same size with the keys and the values, and then ZIP
and MERGE
. In the github example you linked, imagine having an existing object like this:
LET stargazers = {
"ziflex": "clock",
"MontFerret": "organization",
"Kremlin": "location"
}
And then in the example instead of just showing the icon type, you wanted to iteratively append to the users object with the username as property name. Something like APPEND(stargazers, {"Gusyatnikova": "organization"})
inside a FOR
.
I realize it's a design issue, it's the first thing I said, but sometimes I just feel that if I could write the scripts in for example JS, I would be saving a lot time overthinking how to make things work. And please don't get me wrong, I love ferret and I think it's a great piece of software.
Well, certain limitations were done intentionally while others were a result of the source of inspiration. Regarding the lack of
if/else
I do not see how it would simplify your logic, you would still have nested conditions.
else if
helps avoid nesting. A switch could also be used instead. In my example there's only 4 conditions, imagine the nesting if there were 15 or more.And again, most of the time it's the way you solve particular problems and you just need to switch your thought process from imperative to declarative flow. You can always switch to
xpath
and do something like this.The problem was not getting the data, but knowing what type of data it is and appending it to the already existing property object. Instead of doing
property[data] = value
in a singleFOR
, I had to create two arrays, making sure they are the same size with the keys and the values, and thenZIP
andMERGE
. In the github example you linked, imagine having an existing object like this:LET stargazers = { "ziflex": "clock", "MontFerret": "organization", "Kremlin": "location" }
And then in the example instead of just showing the icon type, you wanted to iteratively append to the users object with the username as property name. Something like
APPEND(stargazers, {"Gusyatnikova": "organization"})
inside aFOR
.I realize it's a design issue, it's the first thing I said, but sometimes I just feel that if I could write the scripts in for example JS, I would be saving a lot time overthinking how to make things work. And please don't get me wrong, I love ferret and I think it's a great piece of software.
No worries, you are sharing the problems you are facing with using Ferret and that's fine!
Yes, I admit that it might be frustrating at times not being able to mutate objects in queries. I will think about how we can mitigate it in the future releases.
With the old cli, you could run
ferret --cdp http://127.0.0.1:9222 script.fql
and it would work without problem. What is the equivalent command with the latest cli version? I tried the following:ferret exec --browser-headless --browser-address http://127.0.0.1:9222 script.fql
, errs with not supported: CLICK(...)ferret exec --browser-address http://127.0.0.1:9222 script.fql
, errs with not supported: CLICK(...)ferret exec --runtime http://127.0.0.1:9222 script.fql
andferret exec --runtime http://127.0.0.1:9222 --browser-headless script.fql
, returns HTML code with title Headless remote debugging