Open lewismc opened 10 years ago
I notice that Gora supports persistence to mongoDB as an option. This could serve as a bridge between analytics that use Hadoop/Pig/Hive and analytics or visualizations that are not based on the Hadoop et. al architecture.
Hi @curtislisle certainly. This is exactly the type of thing we can focus on. I would image we could design and implement an ETL process that would map one dataset to many datastores. We can then augment the analytic tools which have access to the data regardless of its location. Thanks for the feedback.
As a goal - and I'm really not sure if this is the sort of data/analytics you're thinking about (I know Landsat, not these acronyms), but it could be to have a map & regional wordcloud of available polar data and/or publications...
Yes we could certainly work towards this no doubt. Out of curiosity, what value do you think map/regional word clouds would bring? Personally, I think they would be very cool to present at the end of the workshop as a point of reflection. Any ideas?
On Fri, Oct 17, 2014 at 2:55 PM, Allen Pope notifications@github.com wrote:
As a goal - and I'm really not sure if this is the sort of data/analytics you're thinking about (I know Landsat, not these acronyms), but it could be to have a map & regional wordcloud of available polar data and/or publications...
— Reply to this email directly or view it on GitHub https://github.com/NSF-Polar-Cyberinfrastructure/datavis-hackathon/issues/14#issuecomment-59581685 .
Lewis
A map / word clouds could be a good way to investigate what sorts of science have been done in a region - for example finding out that I'm interested in a glacier in a region that only has limnologists working there. Or to find what other datasets/published research are available in the region. So for data and research discovery this sort of tool could be useful, I think. Also great if you're getting into a new study region and want to get the lay of the land.
word cloud sounds like a great idea, @allenpope It's also something that Apache Tika can really excel at. See: http://baron.pagemewhen.com/item/84/
@chrismattmann - yes! But is there something to build wordclouds interactively (e.g. trace a polygon on the map) rather than dumpling into wordle, etc.?
Hey @allenpope not directly in Tika, but we could develop something at the workshop that combines Tika and datavis interactively - that would be awesome! I smell another session (with @allenpope as the proposer)? +1
"A map / word clouds could be a good way to investigate what sorts of science have been done in a region" This is such a great and useful idea- not only for getting into a new study region, or to see if science has been done/ data is available in your current study region that you aren't aware of- but, if done in an accessible way it would also be great for casual/citizen scientist and students who are interested in the information, but either don't know how/where to conduct literature searches or find them daunting.
OK this sounds like we are centering on some concrete goals for this session (or sub-goals at least if @lewismc agrees as the lead proposer):
Sound good? Thanks @smskiles and @allenpope and @curtislisle
@chrismattmann - I think that sounds great, as long as some of it can be done on-the-fly by the user (e.g. select region/map area and a word cloud appears), as opposed to having a static product? What do you think would be most useful @smskiles?
Yep agree @allenpope. Well get there it may start out as static though. I am going to do some pre hacking this week
Makes sense - and awesome!
@chrismattmann @smskiles One of the developers here at NSIDC made the good point that Wordles / word clouds often aren't the most useful visualization because they don't really let you read the small things and they distort the relative importance of things. Might be good to think about using something more quantitative instead / in addition, to display the relative importance of keywords, etc.
Thanks @allenpope good points. We can start simple with Wordies/clouds, and then move to something more quantitative. I'll do some research on this.
@allenpope I agree, a word cloud would be interesting to see, but might not be the most useful. It's not as exciting, but a ranked list might be more useful? (e.g. for a literature search, ranking by most recent or most cited). I think scale might be an interesting issue with this- i.e what is included/excluded based upon the size of your region of interest.
Whilst in deep conversation with a fellow ETL guru (@MBoustani), I had a brain wave and I am now flushing it out here.
I would like this session to focus on
Personally, I would really like to explore whether a Pig or Hive module for Gora would be valuable here. In order to determine this however I need to learn more about the data analysis tasks we gather in 1 above.