USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
411 stars 143 forks source link

banana dashboard does not load the injected jobs #141

Closed YehualashetGit closed 6 years ago

YehualashetGit commented 6 years ago

Issue Description

Hello .I am trying to build the sparkler using the docker image and after I inject the jobs with id and try to see index and other behaviors in the banana dash board it always say zero .I don't know the reason why this happen

I use ubuntu 16.04 lts version and the docker version is Docker version 17.09.0-ce, build afdb6d4 and i use jdk 1.8 and the spark version is the latest ones. Thank you!!

thammegowda commented 6 years ago

There could be two cases:

  1. inject and crawl are failing so there is no updates on solr index, hence the dashboard is not updating
  2. inject and crawl are okay, but the dashboard has an issue in connecting to solr

Could you please query solr with this: http://localhost:8983/solr/crawldb/query?q=*:*&rows=0&facet=true&facet.field=status and paste the output here? Thanks

YehualashetGit commented 6 years ago

{ "responseHeader":{ "status":0, "QTime":1231, "params":{ "q":":", "facet.field":"status", "rows":"0", "facet":"true"}}, "response":{"numFound":0,"start":0,"docs":[] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{ "status":[]}, "facet_ranges":{}, "facet_intervals":{}, "facet_heatmaps":{}}}

On Mon, Dec 11, 2017 at 2:38 AM, Thamme Gowda notifications@github.com wrote:

There could be two cases:

  1. inject and crawl are failing so there is no updates on solr index, hence the dashboard is not updating
  2. inject and crawl are okay, but the dashboard has an issue in connecting to solr

Could you please query solr with this: http://localhost:8983/solr/ crawldb/query?q=:&rows=0&facet=true&facet.field=status and paste the output here? Thanks

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/USCDataScience/sparkler/issues/141#issuecomment-350591432, or mute the thread https://github.com/notifications/unsubscribe-auth/AX6f3KcoJZ-NTYx9N6khPowN3YSorcSyks5s_GuNgaJpZM4Q8YbS .

YehualashetGit commented 6 years ago

here is my concern .in the docker file at line 20-22 .( RUN groupadd --gid 1000 sparkler && \ useradd -M --uid 1000 --gid 1000 --home /home/sparkler sparkler ).this command I thing will try to create the sparkler user . right? .what do you think?? thank you!!

thammegowda commented 6 years ago

Thanks for sharing the info. "response":{"numFound":0,"start":0,"docs":[] <-- shows there is nothing in solr index, so dashboard is not updating the view.

this command I thing will try to create the sparkler user . right? .what do you think??

I think, it creates sparkler user, and it is intended to create sparkler user. I am curious what trouble is that step causing?


It appears to me that you have taken a really long route to setup sparkler. IMHO, its simple to get started with prebuilt docker:

docker run  -p 8983:8983 -p 4040:4040 --user sparkler -it uscdatascience/sparkler
# inside docker 
/data/solr/bin/solr start
/data/sparkler/bin/sparkler.sh  inject -id j1 -su http://<yoursite>.com
/data/sparkler/bin/sparkler.sh  crawl -id j1
YehualashetGit commented 6 years ago

okk.what if I want to play around with the code.by forking from the GitHub and build my docker image in docker hub and integrate it to the GitHub through automated build.so when I push the code from my local PC to the GitHub the docker hub automatically hook that change and try to update it .then I see new change by pulling the image again .this is my all intention .so what I am supposed to do??.Thank you for your help BTW!!!!!

On Wed, Dec 13, 2017 at 5:02 AM, Thamme Gowda notifications@github.com wrote:

Thanks for sharing the info. "response":{"numFound":0,"start":0,"docs":[] <-- shows there is nothing in solr index, so dashboard is not updating the view.

this command I thing will try to create the sparkler user . right? .what do you think??

I think, it creates sparkler user, and it is intended to create sparkler user. I am curious what trouble is that step causing?

It appears to me that you have taken a really long route to setup sparkler. IMHO, its simple to get started with prebuilt docker:

docker run -p 8983:8983 -p 4040:4040 --user sparkler -it uscdatascience/sparkler

inside docker

/data/solr/bin/solr start /data/sparkler/bin/sparkler.sh inject -id j1 -su http://.com /data/sparkler/bin/sparkler.sh crawl -id j1

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/USCDataScience/sparkler/issues/141#issuecomment-351257869, or mute the thread https://github.com/notifications/unsubscribe-auth/AX6f3N6vew6WbaJDoZn9n-cgkD6ij8ynks5s_zBIgaJpZM4Q8YbS .

supermonk commented 6 years ago

@YehualashetGit : I had the same problem .. In the banana dashboard change the date to current month that might help.

@thammegowda : If I download the image as per documentation, the banana board does not load properly, It shows some default.json setting error.

YehualashetGit commented 6 years ago

@supermonk .the issue is solved by changing the date .Thank aloot!!

thammegowda commented 6 years ago

banana dashboard should be fixed in version 0.1 Please use uscdatascience/sparkler:0.1 image

Thanks

YehualashetGit commented 6 years ago

Hello.againI just try to inject some website url and seed the jobs.and it is displayed on the dashboard .but the issue is that when I try to search in the search by using related idea it return unrelated response not even close to the question that I was asking so. what I am I suppose to do in order to improve the search performance and content understanding. thank you!!!!!

On Mon, Dec 18, 2017 at 10:52 PM, Thamme Gowda notifications@github.com wrote:

banana dashboard should be fixed in version 0.1 Please use uscdatascience/sparkler:0.1 image

Thanks

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/USCDataScience/sparkler/issues/141#issuecomment-352539371, or mute the thread https://github.com/notifications/unsubscribe-auth/AX6f3DS3CgRgoI40eGZEYJtyXsBONAO2ks5tBsKWgaJpZM4Q8YbS .

thammegowda commented 6 years ago

Hi, could you share a few example search queries ?

FYI, the search box accepts Lucene queries. http://www.lucenetutorial.com/lucene-query-syntax.html Its not the end user search box like the google search.

YehualashetGit commented 6 years ago

What I did is inject http://kidshealth.org/,then I go to the banana dashboard and in the search bar I try to ask question-related to health like .what is the cause of cancer stuff like that , then the dashboard return something from the page

On Fri, Jan 12, 2018 at 12:26 AM, Thamme Gowda notifications@github.com wrote:

Hi, could you share a few example search queries ?

FYI, the search box accepts Lucene queries. http://www.lucenetutorial.com/ lucene-query-syntax.html Its not the end user search box like the google search.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/USCDataScience/sparkler/issues/141#issuecomment-357066536, or mute the thread https://github.com/notifications/unsubscribe-auth/AX6f3KZfWbQG-AfxoklkHOTyr069Bw8Pks5tJnyXgaJpZM4Q8YbS .

Aariv commented 5 years ago

There could be two cases:

  1. inject and crawl are failing so there is no updates on solr index, hence the dashboard is not updating
  2. inject and crawl are okay, but the dashboard has an issue in connecting to solr

Could you please query solr with this: http://localhost:8983/solr/crawldb/query?q=*:*&rows=0&facet=true&facet.field=status and paste the output here? Thanks

{ responseHeader: { status: 0, QTime: 1, params: { q: ":", facet.field: "status", rows: "0", facet: "true", }, }, response: { numFound: 2084, start: 0, docs: [ ], }, facet_counts: { facet_queries: { }, facet_fields: { status: [ "UNFETCHED", 2079, "FETCHED", 5, ] }, facet_ranges: { }, facet_intervals: { }, facet_heatmaps: { }, }, }