Closed edkearns closed 6 years ago
Hello,
I'm interested in working on this. I have some questions and doubts.
Ideally, what would be the expected outcome for the project? In my mind, I'm envisioning an open api backed by immutable records/indexes of all files(in a particular dataset for instance) stored on an open distributed ledger. Am i close?
If the proposal includes using a blockchain, is there any preference for using private/permissioned ledger systems(e.g. - hyperledger) as opposed to open ledger systems (e.g. - ethereum) ?
A verification system using immutable distributed ledger catalogs might require some computational (e.g. - hosting an IPFS node) and/or financial resources(gas costs in case of ethereum/ Filecoin) on the part of the org. What are the constraints, parameters and non-functional requirements around these for us to keep in mind before proposing a solution?
Lastly, I would appreciate any guidance on how to proceed with this project.
Thanks!
Isn't this problem complicated in that at least some of the data housed on different cloud providers may be transformed into completely different representations? How Amazon decides to house and make the data accessible on their platform is different than how Google makes the data available to Google Earth Engine. Could an approach include some form of model or workflow that generates a calculated result such as a space/time metric at some scale that appropriately exercises enough of the data to statistically verify adequate data integrity?
Thanks for your interest.
Cheers, Ed
On Fri, Mar 16, 2018 at 8:11 AM, Vyom Sharma notifications@github.com wrote:
Hello,
I'm interested in working on this. I have some questions and doubts.
1.
Ideally, what would be the expected outcome for the project? In my mind, I'm envisioning an open api backed by immutable records/indexes of all files(in a particular dataset for instance) stored on an open distributed ledger. Am i close? 2.
If the proposal includes using a blockchain, is there any preference for using private/permissioned ledger systems(e.g. - hyperledger) as opposed to open ledger systems (e.g. - ethereum) ? 3.
A verification system using immutable distributed ledger catalogs might require some computational (e.g. - hosting an IPFS node) and/or financial resources(gas costs in case of ethereum/ Filecoin) on the part of the org. What are the constraints, parameters and non-functional requirements around these for us to keep in mind before proposing a solution?
Lastly, I would appreciate any guidance on how to proceed with this project.
Thanks!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/GSoC/issues/8#issuecomment-373694891, or mute the thread https://github.com/notifications/unsubscribe-auth/AfablfwtmQzUO5G_VgzLt1w6Z6oI_Gztks5te6vZgaJpZM4SjE2L .
--
Edward J. Kearns, Ph.D.
Chief Data Officer
Office of the Chief Information Officer / High Performance Computing and Communications
National Oceanic and Atmospheric Administration
151 Patton Ave, Asheville, NC 28801
Ed.Kearns@noaa.gov
Office: 828-350-2410 Cell: 828-273-1998 Fax: 828-271-4876
www.noaa.gov/big-data-project
"An expert is a person who has made all the mistakes that can be made in a very narrow field." - N. Bohr
Right, and the original data that are the feedstock for the new presentation could/should be verified, and perhaps the process that creates the new dataforms should also be verified in order for the new dataforms to be considered authenticated. And yes, a statistical subsampling may be the most economical approach given the large number of data files and data points inherent in the problem.
Ed
On Tue, Mar 27, 2018 at 7:51 PM, skybristol notifications@github.com wrote:
Isn't this problem complicated in that at least some of the data housed on different cloud providers may be transformed into completely different representations? How Amazon decides to house and make the data accessible on their platform is different than how Google makes the data available to Google Earth Engine. Could an approach include some form of model or workflow that generates a calculated result such as a space/time metric at some scale that appropriately exercises enough of the data to statistically verify adequate data integrity?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ESIPFed/GSoC/issues/8#issuecomment-376712522, or mute the thread https://github.com/notifications/unsubscribe-auth/AfablcPfRsWkt5PxkmTcAJAnVbdwwDy8ks5titCFgaJpZM4SjE2L .
--
Edward J. Kearns, Ph.D.
Chief Data Officer
Office of the Chief Information Officer / High Performance Computing and Communications
National Oceanic and Atmospheric Administration
151 Patton Ave, Asheville, NC 28801
Ed.Kearns@noaa.gov
Office: 828-350-2410 Cell: 828-273-1998 Fax: 828-271-4876
www.noaa.gov/big-data-project
"An expert is a person who has made all the mistakes that can be made in a very narrow field." - N. Bohr
@edkearns any chance you or another NOAA mentor would like to update and open this issue up again for 2019 GSOC?
Idea
NOAA has been publishing its open data on commercial cloud partners’ platforms as part of its Big Data Project in order to enable easier access and use of those data. However, when users consume NOAA data from a non-NOAA partner platform there needs to be a way for users to verify the authenticity of those data. Ideas are solicited for techniques to verify that those cloud-based data are indeed the same as the original copy of NOAA data. These techniques should allow dynamic, on-demand verification of data at the file/object level, and/or at the tool level (e.g. database or visualization).
Techniques could include, but are not limited to, open distributed ledgers (blockchain), api-accessible catalogs and similar. Open source tools such as hyperledger are encouraged but not required.
NOAABigData
Skills Needed (No prescribed technologies or languages.)
Mentors Ed Kearns, NOAA Chief Data Officer