ISG-ICS / cloudberry

Big Data Visualization
http://cloudberry.ics.uci.edu
90 stars 82 forks source link

Clean up the data ingestion and registering process #318

Closed JavierJia closed 7 years ago

JavierJia commented 7 years ago

We introduced many knobs for the purpose of making a general system. However, the current process of bringing up the TwitterMap demo is too complicated. Before the Admin web page, we need to at least provide ONE script that can handle all the preparation logic.

  1. For data ingestion to AsterixDB, we should use the Feed to avoid moving the data across the machine. We can create the feed, and drop it after ingestion.
  2. For the population data, is it possible to make ONE nested dataset instead of three? Would our lookup query still work? Then the DDL part in Cloudberry is much simpler.
JavierJia commented 7 years ago

could @luochen01 and @haochen07 take a look at the issue and simplify the ingestion and registration process? thank you!

chenlica commented 7 years ago

It's a very important design decision since it affects user's experiences to use our system. Can we do a F2F meeting to discuss?

JavierJia commented 7 years ago

sure.

luochen01 commented 7 years ago

I added @HotLemonJuice to this issue. If we want to make those three Population datasets into one, then we need some data processing work to merge them, and also the front-end queries need to be changed accordingly.

haochen07 commented 7 years ago

I propose that the logic of the process can be

  1. Feed all data into database
  2. Open cloudberry
  3. register data schemas

Thus I suggest that we

  1. use one script to feed all data into database.
  2. Another script to register data schema.

Any ideas on that? @luochen01 @HotLemonJuice

ShengjieXu commented 7 years ago

Yeah. @luochen01 is right. And I agree with @haochen07 's proposal. By separating data ingestion from registration, we allow for the possibility of using other data.

luochen01 commented 7 years ago

Why wouldn't we just use One script to combine data ingestion and registration? For registration, we can run the script after Cloudberry is running.

haochen07 commented 7 years ago

You mean one script and you run it after cloudberry is running?

JavierJia commented 7 years ago

One script would be great for the purpose of the quick start. However, we also may also want to use two scripts in the develop or production system. Sometimes the data is already in the AsterixdB, we only need to register the dataset. I think let start with two scripts, comparing with a lot of manual work that will be already great enough :-)

chenlica commented 7 years ago

@JavierJia Let's schedule a F2F meeting to discuss this important topic?

JavierJia commented 7 years ago

Sure. Will send the invitation later.

On Apr 22, 2017 9:35 AM, "Chen Li" notifications@github.com wrote:

@JavierJia https://github.com/JavierJia Let's schedule a F2F meeting to discuss this important topic?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ISG-ICS/cloudberry/issues/318#issuecomment-296385284, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3jlal3SmsD-FBBPrvId4d3cEk3I5igks5ryiw3gaJpZM4NE173 .

haochen07 commented 7 years ago

I saw @luochen01 's PR tonight. Great job on refactoring the register and data feed scripts. And I think the issue is fixed on current stage. Is there something still to do for us ?

JavierJia commented 7 years ago

It solved 50%. I will take care of the part of starting neo and twittermap servers logic.