MallCloud / contracts-api

API for communicating with Ethereum Contracts
0 stars 0 forks source link

Clarification of an access to data or to a sample data without access to the file #18

Closed markpyzhov closed 6 years ago

markpyzhov commented 6 years ago

The original text of requirements is described in wiki/BlockChain-communication.

  1. Data use can occur through:
    • ...
    • Access to data via API in which the user cannot access the file (i.e., GraphQL)
    • Access to a sample of data via API in which the user cannot access the file (i.e., GraphQL)

@ehillerbrand @daemonslayer can you explain these two points? I'm not sure I understand the process.

For explanation of difference was created #17

kgupta15 commented 6 years ago

I am assuming that the points are an extrapolation of the idea of using graphql for ML models. In here, we can allow users to access data fields with some restrictions without allowing them to see the whole data.

Regarding the usage, there is one problem. Graphql can not hide the structure of a dataset. That is, all fields that are meant to be accessible via graphql will be visible to everyone and they can't be hidden (although they can be restricted via token based access). If there is some dataset in which we dont want certain people to even know that particular fields exist, using REST API would be better than Graphql as REST allows such hiding to be possible.

markpyzhov commented 6 years ago

If it is possible can you describe step-to-step the use case?

ehillerbrand commented 6 years ago

This is what Kapil said and what he and I discussed. Either we have to handle this through GraphQL or by requiring the same dataset to be registered in different ways.

We would need a use case where the structure is hidden or the number of records are limited so that a Data Scientist can work on an Algorithm with only 1000 records, for example.

Access would be specified by the Customer. A Customer would say:

  1. Load my Customer Data for this Solution
  2. Create a sample of the Data
  3. Allow DS to use the sample for building an algorithm
  4. We would ensemble
  5. We would execute the final ensemble against the entire dataset.

The sample structure could be very standard. Always, 10000 records

markpyzhov commented 6 years ago

We would need a use case where the structure is hidden or the number of records are limited so that a Data Scientist can work on an Algorithm with only 1000 records, for example.

I'm little confused by "where the structure is hidden". The structure is hidden or the data itself? If the data itself, then I would like to propose an additional way: MallCloud/Algorithm-Market-Platform#74

markpyzhov commented 6 years ago

Confirmed proposal: https://github.com/MallCloud/Algorithm-Market-Platform/issues/74#issuecomment-338514603