dinubs / jam-api

Parse web pages using CSS query selectors
http://www.jamapi.xyz
Other
1.37k stars 57 forks source link

More complex example please #20

Open NickStees opened 7 years ago

NickStees commented 7 years ago

I am struggling to do a more complex implementation of this... for example given the following HTML (and having multiples on the page)

<div class="card">
  <div class="card-title">Title</div>
  <div class="card-desc">Description text here</div>
  <a href="#link" class="card-link">More info</a>
</div>

How would you format your JSON request to get those cards? This is what I assume it would be but lack of docs has me guessing...

{
    "title": "title",
    "news": [{
        "elem": ".card",
        "cardTitle": ".card .card-title",
        "cardDesc": ".card .card-desc"
    }]
}

Also the example on jamapi.xyz should be like something above, and not rely on website that can change like it does.

dinubs commented 7 years ago

Unfortunately jam-api doesn't support nesting like you have there, what you'd have to do instead is something like this:

{
    "title": "title",
    "news_titles": [".card-title"],
    "news_descs": [".card .card-desc"]
}
NickStees commented 7 years ago

Awe... OK got it.

On Wed, Dec 21, 2016, 4:39 PM Gavin Dinubilo notifications@github.com wrote:

Unfortunately jam-api doesn't support nesting like you have there, what you'd have to do instead is something like this:

{ "title": "title", "news_titles": [".card-title"], "news_descs": [".card .card-desc"] }

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gavindinubilo/jam-api/issues/20#issuecomment-268647488, or mute the thread https://github.com/notifications/unsubscribe-auth/ACq2LpQ04OAUfM2SFtwnVhtjkqhhR6uNks5rKZyEgaJpZM4LTZBW .

dinubs commented 7 years ago

Sorry about that, there's an issue for a nesting api, but I haven't started on it at all.

NickStees commented 7 years ago

No problem I was trying everything I could think of for a while today. Maybe I'll branch the repo and try to expand upon it and send you a pull request. Thanks for replying!

On Wed, Dec 21, 2016, 5:06 PM Gavin Dinubilo notifications@github.com wrote:

Sorry about that, there's an issue for a nesting api, but I haven't started on it at all.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gavindinubilo/jam-api/issues/20#issuecomment-268654396, or mute the thread https://github.com/notifications/unsubscribe-auth/ACq2LgLqgHpon_r4KNnzTyi655o2XQ-Wks5rKaLrgaJpZM4LTZBW .

gerchicov-bp commented 7 years ago

Hello I read this thread but still can't understand how to work with this library. For example we have a standard Wordpress blog: http://gargo.of.by/

How to get an array of ["title", "text", "link"] for each article from this page? I understand that I could take this info via RSS but it is just an example. I ask because there are a lot of examples on your website but I understand how to take an array of tags by their name only.

rike422 commented 7 years ago

@NickStees

I made npm package like that https://github.com/rike422/kirinuki-core

following example by using kirinuki-core

{
    "title": "title",
    "news": {
        "_unfold": true,
        "cardTitle": ".card .card-title",
        "cardDesc": ".card .card-desc"
    }
}

would you like to try use this library?

dinubs commented 7 years ago

Hey @gerchicov-bp, somehow your message slipped through my notification feed, incredibly sorry about that.

It does look like @rike422's package might be better for this specific instance, it's currently very difficult to grab multiple elements like you want. If you are still interested in using jamapi you can use the following json data to get it working:

{
  "article_links_and_titles": [{"elem": ".entry-title a", "link": "href", "title": "text"}],
  "article_texts": [".entry-content"]
}