ISG-ICS / cloudberry

Big Data Visualization
http://cloudberry.ics.uci.edu
90 stars 82 forks source link

Twittermap Example: Syntax Errors when Ingesting Twitter Data to Docker AsterixDB Instance #489

Closed JeffreyLimbacher closed 6 years ago

JeffreyLimbacher commented 6 years ago

I used ./script/dockerRunAsterixDB.sh to start an Asterix DB instance. When ingesting the twitter data, I get syntax errors from Asterix DB:

Ingesting sample tweets...
{
    "requestID": "d0d946a1-9e4c-4534-bdfc-fe2b3ae5a065",
    "signature": "*",
    "errors": [{ 
    "code": "1",
    "msg": "Syntax error: In line 51 >>with filter on create_at with {\"merge-policy\":{\"name\":\"prefix\",\"parameters\":{\"max-mergable-component-size\":134217728, \"max-tolerance-component-count\":5}}}; << Encountered \"{\" at column 31. "
    }],
    "status": "fatal",
    "metrics": {
        "elapsedTime": "50.345927ms",
        "executionTime": "0ns",
        "resultCount": 0,
        "resultSize": 0
    }
}

This also happens when creating feeds:

Ingesting population data...
{
    "requestID": "e9c86492-d20b-4125-a9d3-369af65ad8f8",
    "signature": "*",
    "errors": [{ 
    "code": "1",
    "msg": "Syntax error: In line 29 >>create feed StatePopulationFeed with { << Encountered \"with\" at column 33. "
    }],
    "status": "fatal",
    "metrics": {
        "elapsedTime": "1.125563ms",
        "executionTime": "0ns",
        "resultCount": 0,
        "resultSize": 0
    }
}

It seems that the shell scripts are using old syntax. I managed fix the syntax errors on the population ingestion by changing the syntax, e.g. I changed

create feed StatePopulationFeed with { 
    "adapter-name" : "socket_adapter", 
    "sockets" : "asterix_nc1:10003", 
    "address-type" : "nc", 
    "type-name" : "typeStatePopulation", 
    "format" : "adm", 
    "upsert-feed" : "false" 
}; 

to

create feed StatePopulationFeed using socket_adapter (
    ("sockets" = "nc1:10003"), 
    ("address-type" = "nc"), 
    ("type-name" = "typeStatePopulation"), 
    ("format" = "adm"), 
    ("upsert-feed" = "false" )
);

in the appropriate places. However, Noah then gets a NullPointerException:

Ingested county population dataset.
Read from stdin
Connection refused (Connection refused)
[error] (run-main-0) java.lang.NullPointerException
java.lang.NullPointerException
    at edu.uci.ics.cloudberry.noah.feed.FeedSocketAdapterClient.finalize(FeedSocketAdapterClient.java:35)
    at edu.uci.ics.cloudberry.noah.feed.FileFeedDriver.doMain(FileFeedDriver.java:111)
    at edu.uci.ics.cloudberry.noah.feed.FileFeedDriver.main(FileFeedDriver.java:44)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
[trace] Stack trace suppressed: run last noah/compile:runMain for the full output.
java.lang.RuntimeException: Nonzero exit code: 1
    at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last noah/compile:runMain for the full output.
[error] (noah/compile:runMain) Nonzero exit code: 1
[error] Total time: 1 s, completed Mar 1, 2018 8:29:16 AM

ingesttion_output.txt: Contains the output from the original script files which has the syntax errors.

waans11 commented 6 years ago

Hi Jeffrey,

Sorry about this trouble that you have gone through. Our documentation needs to be updated to reflect a recent change regarding the AsterixDB changes.

For now, What you can try is:

  1. Create a directory named "asterixdb" in your home directory and move to that directory. $ mkdir asterixdb $ cd asterixdb

  2. Download asterix-server-0.9.3-SNAPSHOT-binary-assembly.zip from this link. http://cloudberry.ics.uci.edu/img/asterix-server-0.9.3-SNAPSHOT-binary-assembly.zip

  3. Uncompress the file. $ unzip asterix-server-0.9.3-SNAPSHOT-binary-assembly.zip

  4. Move to "opt/local/bin" directory. $ cd opt/local/bin

  5. Execute "start-sample-cluster.sh" (Windows: start-sample-cluster.bat) to start the sample instance. You should see "INFO: Cluster started and is ACTIVE." message. $ ./start-sample-cluster.sh

CLUSTERDIR=/home/x/asterixdb/opt/local INSTALLDIR=/home/x/asterixdb LOGSDIR=/home/x/asterixdb/opt/local/logs

INFO: Starting sample cluster... INFO: Waiting up to 30 seconds for cluster 127.0.0.1:19002 to be available. INFO: Cluster started and is ACTIVE.

  1. Execute "jps" to check one instance of "CCDriver" and two instances of "NCService" and "NCDriver" are running. $ jps 59264 NCService 59280 NCDriver 59265 CCDriver 59446 Jps 59263 NCService 59279 NCDriver

  2. Move to "cloudberry/examples/twittermap" directory. Here, we suppose that you have cloned cloudberry in your home directory. $ cd ~/cloudberry/examples/twittermap

  3. Execute ./script/ingestAllTwitterToLocalCluster.sh to ingest the sample Tweet data. (#Please make sure you already installed "sbt" and "scala" packages, or this step will download and install "sbt" and "scala" automatically, which will take a lot of time.) $ ./script/ingestAllTwitterToLocalCluster.sh (This process may take a few minutes according to your environment.)

  4. Open the AsterixDB Web interface (http://localhost:19001) and issue the following queries to see the ingestion has finished without an issue.

use twitter; select count( ) from ds_tweet; select count( ) from dsStatePopulation; select count( ) from dsCountyPopulation; select count( ) from dsCityPopulation;

Results: { "$1": 47000 }

Results: { "$1": 52 }

Results: { "$1": 3221 }

Results:

{ "$1": 29833 }

  1. AsterixDB setup is done. Execute the following to start Twittermap demo

On cloudberry/cloudberry $ sbt "project neo" "run"

On cloudberry/examples/twittermap $ sbt "project web" "run 9001"

  1. Access Twittermap demo at: http://localhost:9001

  2. You need to execute the following command to stop AsterixDB on (asterixdb/opt/local/bin) when you are done. $ ./stop-sample-cluster.sh

JeffreyLimbacher commented 6 years ago

Thanks for the reply, Taewoo. The count queries are giving me an error:

function twitter.count@0 is not defined [CompilationException]

It doesn't seem to impact the Twitter map example. Thanks.

waans11 commented 6 years ago

I am not sure why, but the * sign is missed.

select count(*) from ds_tweet;

JeffreyLimbacher commented 6 years ago

Thanks. It works now.

chenlica commented 6 years ago

Jeff,

Thanks for using Cloudberry! We are in the process of updating the online documentation. Before that, please communicate with our team directly to solve such problems.

Chen

On Thu, Mar 1, 2018 at 11:07 AM, Jeffrey Limbacher <notifications@github.com

wrote:

Thanks. It works now.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ISG-ICS/cloudberry/issues/489#issuecomment-369696558, or mute the thread https://github.com/notifications/unsubscribe-auth/AMU9nexbNcrVqv6LGwZXxfRypu0uTvT6ks5taEbggaJpZM4SYmyG .