Keystone-Technologies / keystone-technologies.github.io

1 stars 0 forks source link

Data #43

Open s1037989 opened 8 years ago

s1037989 commented 8 years ago

Problem

I'm working on a problem for Becky's Wellness Data... There's lot of data coming in from different sources, and the more we have the better data analysis we can have.

I bring this up here because it's been a recent hot topic around the office lately. I've always used Strava and Leo as my example. Strava is my go to data-visualizer for all activity-based data. It's a great site for that purpose, and it has a fair amount of data. But what happens when I want to add new data into the mix, like Lactic usage from Leo?

  1. Strava doesn't take the data. Ok, maybe we can get the data from Leo into one of the standard XML-based file formats like XML, GPX, or TCX. So technically Strava has the data. But does it really? When Strava processes the GPX file, it probably doesn't just leave it in an XML file. The XML file is surely just a transport format. So, when Strava processes the XML and it sees the Leo data in it, it'll just skip over that because it doesn't know what to do with it. But let's say that it does add the Leo data to its database -- let's just say that raw XML files are the database, so whatever is in the XML, Strava knows about... ok, so:
  2. Strava doesn't use the data. What good is data if it doesn't get used? Well, most important is having the data. Load up all the data and ignore 99% of it, at least you have it for the day you decide to make use of it. Stop waiting to collect data until that time you know what to do with it! By the time you figure it out, you've already lost a ton of data, data that could have been really useful in the research for figuring out what to do with the data in the first place! But I digress... So Strava doesn't use the Leo data. What's the point of having a Leo? For me to hack my own DIY solution and coerce two separate products into giving me the visualization I'm seeking? What if the data sources weren't just two, but three, four, more, even 100? DIY doesn't sound so fun or reasonable any more does it? Do we we reinvent all of Strava just so we can also include Leo data? What a waste of human resources! Do we yell at Strava to use Leo data? We still have the 100+ scalability issue. 100 years from now... Will the world still be faced with this problem? In just 60 years we went from no human flight of any kind ever in millions of years to walking on the ever-lovin' MOON. In 100 years will we still be faced with this major problem of having data but no single-method for accessing all of it? How could this problem be solved???

But I digress... As for Becky's Wellness data, she has lots of data coming in from different sources, at different frequencies, with inconsistency. How can analysis of all this data -- the more the better -- be accomplished for Becky's Wellness program, for Strava and Leo, and for everything else out there that involves data??

As with #41, is there a single business logic function solution that can be pulled from this, to address this globally for all data? Or must multiple single business logic function solutions be developed for every type of data that exists (status quo?)?

Proposal

We choose to solve this problem for humanity, not because it is easy, but because it is hahd.