MIT IEEE URTC 2023. GSET 2023. Repository for "SeBRUS: Mitigating Data Poisoning in Crowdsourced Datasets with Blockchain". Using Ethereum smart contracts to stop AI security attacks on crowdsourced datasets.
For our application, we need to be able to easily track metadata about the contributed datasets so that we can easily know where our datasets are. This is because our initial implementation of our blockchain will have the Dataset smart contract for each dataset point to a unique address. We will also need to include some additional information about our datasets such as a description and title so that the dataset is easily searchable. We may extend this feature later on to include more information about datasets.
Proposed Solution
Below is a set of resources and a list of action items that will help with implementing this feature. There are some modifications that need to be made to the examples on the resources below in order to complete this feature. There are three files you will need to create/modify to complete this ticket: app.py, routes/dataset.py, and models/dataset.py. For similar instructions or references, check out the implementation of /api/user in #8.
Action Items
[x] implement the defined schema in the Flask-SQLAlchemy ORM. Create the Dataset class in models/dataset.py to include an id parameter (this can be a db.Integer and be the primary_key of the table, is unique, and is not nullable). Add additional columns for dataset name, description, and address (all strings of an arbitrary size - 80 is an acceptable value). Additionally, add a json() function which returns a Python Dictionary of itself where the keys and values are all of the fields in the table. For creating database ids, you can create a countstatic variable that keeps track of the number of users that are created and increments the static variable every time a new user is created to be used as the value of the newly created user id.
[x] update app.py to include an endpoint for /api/dataset/<id> and make sure to pass dataset(id) to include router(id) handler implemented in routes/dataset.py
[x] implement the GET handler in routes/dataset.py to return all of the information about a user based on the given user ID. get(id) should query the Dataset table based on the id and get the first() dataset object of that query. Once that object is queried, if the dataset object is None(no dataset by that id), then it should return an empty JSON and 404. If the dataset object is found, then return the value of the user.json() method and 200.
[x] implement the POST handler in routes/dataset.py to create a new dataset and return all of the information about a dataset. Check the incoming request's JSON body using data = request.json. If name, description, or address don't exist then return an empty JSON and 400. If a dataset exists in the database with the same name, return an empty dictionary and 409. If this a unique new dataset, then create a new Dataset object with the corresponding name, description, and address and use the corresponding dataset.save_to_db() function. Return the value of dataset.json() and 200.
[x] implement the PUT handler in routes/dataset.py to edit information about an existing dataset. Check the incoming request's JSON body using data = request.json.If name, description, or address don't exist then return an empty JSON and 400. Check if the dataset is in the database. If the dataset is not in the database, then return an empty dictionary and 404. If the dataset is found then update the values of the dataset object and save to the database using the corresponding dataset.save_to_db() function. Return the value of dataset.json() and 200.
[x] implement the DELETE handler in routes/dataset.py to delete an existing dataset. Check if the dataset is in the database. If the dataset is not in the database, then return an empty dictionary and 404. If the dataset is found then use the corresponding dataset.remove_from_db() function and return an empty JSON and 200.
Here are some tests that can be done to verify the implementation of POST, GET, PUT, DELETE.
Use Case
For our application, we need to be able to easily track metadata about the contributed datasets so that we can easily know where our datasets are. This is because our initial implementation of our blockchain will have the
Dataset
smart contract for each dataset point to a unique address. We will also need to include some additional information about our datasets such as a description and title so that the dataset is easily searchable. We may extend this feature later on to include more information about datasets.Proposed Solution
Below is a set of resources and a list of action items that will help with implementing this feature. There are some modifications that need to be made to the examples on the resources below in order to complete this feature. There are three files you will need to create/modify to complete this ticket:
app.py
,routes/dataset.py
, andmodels/dataset.py
. For similar instructions or references, check out the implementation of/api/user
in #8.Action Items
Dataset
class inmodels/dataset.py
to include anid
parameter (this can be adb.Integer
and be theprimary_key
of the table, is unique, and is not nullable). Add additional columns for dataset name, description, and address (all strings of an arbitrary size - 80 is an acceptable value). Additionally, add ajson()
function which returns a Python Dictionary of itself where the keys and values are all of the fields in the table. For creating database ids, you can create acount
static variable that keeps track of the number of users that are created and increments the static variable every time a new user is created to be used as the value of the newly created user id.app.py
to include an endpoint for/api/dataset/<id>
and make sure to passdataset(id)
to includerouter(id)
handler implemented inroutes/dataset.py
routes/dataset.py
to return all of the information about a user based on the given user ID.get(id)
should query theDataset
table based on theid
and get thefirst()
dataset object of that query. Once that object is queried, if the dataset object isNone
(no dataset by that id), then it should return an empty JSON and 404. If the dataset object is found, then return the value of theuser.json()
method and 200.routes/dataset.py
to create a new dataset and return all of the information about a dataset. Check the incoming request's JSON body usingdata = request.json
. Ifname
,description
, oraddress
don't exist then return an empty JSON and 400. If a dataset exists in the database with the same name, return an empty dictionary and 409. If this a unique new dataset, then create a newDataset
object with the corresponding name, description, and address and use the correspondingdataset.save_to_db()
function. Return the value ofdataset.json()
and 200.routes/dataset.py
to edit information about an existing dataset. Check the incoming request's JSON body usingdata = request.json
.Ifname
,description
, oraddress
don't exist then return an empty JSON and 400. Check if the dataset is in the database. If the dataset is not in the database, then return an empty dictionary and 404. If the dataset is found then update the values of the dataset object and save to the database using the correspondingdataset.save_to_db()
function. Return the value ofdataset.json()
and 200.routes/dataset.py
to delete an existing dataset. Check if the dataset is in the database. If the dataset is not in the database, then return an empty dictionary and 404. If the dataset is found then use the correspondingdataset.remove_from_db()
function and return an empty JSON and 200.Here are some tests that can be done to verify the implementation of POST, GET, PUT, DELETE.
Resources
https://flask.palletsprojects.com/en/2.3.x/#api-reference https://flask-sqlalchemy.palletsprojects.com/en/3.0.x/quickstart/ https://github.com/bliutech/wikisafe/blob/main/server/app.py
This is a :rocket: Feature Request