Open markpyzhov opened 6 years ago
It is a graphQL request that is sent by a client. There are 2 inputs taken by the client : token & details. The token is for access to the python api (to get user details) and the details are regarding the request. For ex. if you want to create a notebook, you will send the details accordingly (language, ensemble etc). The fields available that can be sent with the query are based on what you are creating (possible choices for now : Notebook, Dataset, APIEndpoint) are here :
Thanks Kapil, sorry for confusing by my question. I understand how to make the request, I just want to figure out what is "create API" means.
The API is another feature that will be added. In this, a buyer can try the final model of an algorithm using an api by sending requests. This doesnt require the buyer to perform all the training and testing before using it. Also, buyer can add multiple algorithms together to create a pipeline using this feature. The create api is the way to create this pipeline, edit and other functions accordingly performing their functions.
Each DS (or a group of DS) creates a model. How does a customer interested in that model try that model before buying (they aren't interested in running code, only running the trained model on some datapoints)?
A customer needs a full pipeline for their data. They find multiple algorithms that are for all stages of the output they need. For example, they find 2 algos each for cleaning up data, reducing number of parameters and performing classification for data. Maybe they need more. How do they create a pipeline and test a few datapoints on this pipeline? They wouldn't want to keep contacting the original DS for each algo (this is a very small pipeline and it already has 8 DS involved).
Customer doesn't care about the language used but our platform does. How do we allow customer to try any algo without worrying about the language in which the algo was written ie; how do we make everything language agnostic?
How to help the customers deploy the final pipeline they chose, so that they have to do the least work and we provide everything out of the box?
Create REST API for each model and create a proxy system like graphql to support it.
API system for various languages :
REST API maybe difficult to create : I've found several software to automate the process of API development.
* The tf-serving can be connected to graphql.
* The Spark framework provides API support.
* The R language has out-of-the-box API development support.
* C++ can be connected using gRPC or other REST frameworks.
Will this lead to everything slowing down, & is it scalable : It shouldn't slow things down as in graphql, each api will be separately contacted without any dependencies As much as I have thought by now, it is completely scalable.
Time Taken for this solution : A basic solution can be created within 2 weeks. I developed an MVP on the weekend. The MVP works.
Basic Architecture Info : It follows a serverless architecture system.
This is not the final design. I am iterating over various options with Xing. I will update when I finalize it & add it into the docs.
The API is another feature that will be added. In this, a buyer can try the final model of an algorithm using an api by sending requests. This doesnt require the buyer to perform all the training and testing before using it. Also, buyer can add multiple algorithms together to create a pipeline using this feature. The create api is the way to create this pipeline, edit and other functions accordingly performing their functions.
Does API mean chain of training/testing (or something before/after)?
Solution:
I don't understand why do we need this if we already configured Jupyter Notebook Server, which apply us to trigger any algorithm in any language (we just need to append the installation of new language or new library into Dockerfile).
UPD The "template" approach will be eliminated soon. Then we will get the algorithms inside *.ipynb files and the semantical assignment will be prepended into these as code-cell with the syntax of language, which is used inside ipynb.
No, the API is not chaining training/testing the algorithms. The API is used in order to directly use the models that are created after training/testing is complete.
I don't understand why do we need this if we already configured Jupyter Notebook Server, which apply us to trigger any algorithm in any language (we just need to append the installation of new language or new library into Dockerfile).
Consider that a DS has written a Neural Network based algorithm. Such algorithms have large training/testing time. A buyer is not interested in watching an algorithm train. They just want to send a datapoint and see if the algorithm works according to their requirements or not.
Similarly for all algorithms that are using large amounts of data, their training/testing time can be huge. Triggering Jupyter Notebooks leads to working on that code. We dont want to run the code everytime someone wants to try something.
We will also be able to chain multiple models to allow a buyer to create and test a pipeline of algorithms for multiple stages and ask them to pay per query instead of having to buy the algorithm or ask for access rights by the author and waiting for it to become available.
Well, you are talking about trained and saved model (for example for R is Rdata), right?
Yes. Example API : http://api.cortical.io
Then we already have the process.
Please explain the solution.
Here are two issues:
Not all steps are completed, but I don't see the problem to do these. Since we have "template" approach we need to use little different logic, but it is mostly follows the flow.
I dont want to trigger training. And there are other beneficial things there too. I will check with you regarding security while working with APIs as I don't know much about that, though.
I dont want to trigger training. And there are other beneficial things there too.
Yes, but before we can get trained model we need to make training. My example of flow probably confuses you, because it explains training flow. But for testing or using it to get future result the process is quiet similar, except these points:
Before the point:
2.iii.a. Django sends a command into Jupyter Notebook Server and creates everything it need in GraphDB.
Will be a moment when administrator or customer uploads trainer model into GCS via Django API (we already have the logic implemented).
Instead of:
2.iii.b. Jupyter Notebook Server downloads ipynb and dataset
Will be "Jupyter Notebook Server downloads ipynb, dataset and trained model."
Before the:
2.iii.d. Jupyter Notebook Server appends code with saving behavior.
Will be "Jupyter Notebook Server prepends code with loading trained model behavior."
Here are the points that the proposal does & I am not sure the current solution reflects them.
How does it do pay per query?
If I need a pipeline of multiple algos (if there is one for cleaning data, one for reducing parameter list, one for training and more intermediate algos for other tasks), how do I make it? How do I do this without having to ask to run the code by DS.
Can a DS upload different versions of the same algorithm (with different data/more or less training/different kinds of datasets)? It would be kinda confusing to have to run multiple trainings before being able to use the models.
If I don't want to train/test algorithm but just want to use the model, is going through that channel worth it? Shouldn't there be a simpler and faster channel that allows just usage of final models created by the DS for people to use (like a final app instead of a beta version in testing).
If a user wants to use the api in production as they have achieved the required result after talking with DS to modify their algo and then create a final model (or they just liked the model to begin with), can the current solution be scaled for their needs? Essentially, if say, I have an android app that needs an ML algorithm. Would I want to use the current solution for accessing the ML algorithm or an api similar to cortical?
The API is for creating a final endpoint, when all training/testing iterations are complete and model is ready for use. If the answer to some/all of these questions points toward API, I think its useful to have. Also, I think current solution is great but the flow for a DS would be just a bit different when using both the current solution and API:
We do not want to duplicate what has already been implemented with notebooks but a simple trained model test through an API makes sense. I do not want an elaborate development effort for the API. Mark/Kapil, let's make sure you both are in good communication about what needs to be developed and what has already been developed. @Teskuroi @daemonslayer
- How does it do pay per query?
I think that is where C-API supposed for.
- If I need a pipeline of multiple algos (if there is one for cleaning data, one for reducing parameter list, one for training and more intermediate algos for other tasks), how do I make it? How do I do this without having to ask to run the code by DS.
Presently we have chains in D-API. It is possible to make Admin choose components of chain and execute it (they just need to be compatible together).
- Can a DS upload different versions of the same algorithm (with different data/more or less training/different kinds of datasets)? It would be kinda confusing to have to run multiple trainings before being able to use the models.
For the Jupyter Notebook Server it doesn't matter are these different versions or different algorithms at all. All it need is trained model (if exists), algorithm, and everything which is need as input.
- If I don't want to train/test algorithm but just want to use the model, is going through that channel worth it? Shouldn't there be a simpler and faster channel that allows just usage of final models created by the DS for people to use (like a final app instead of a beta version in testing).
Here are only three phases:
Saved model is just a file of data (in some languages as like R), it doesn't keep an ability to make something with that without an algorithm. When someone has trained model he can initiate the original algorithm to use the model to predict on new inputs (in the same structure). In other words "test" and "future" phases are very similar.
- If a user wants to use the api in production as they have achieved the required result after talking with DS to modify their algo and then create a final model (or they just liked the model to begin with), can the current solution be scaled for their needs? Essentially, if say, I have an android app that needs an ML algorithm. Would I want to use the current solution for accessing the ML algorithm or an api similar to cortical?
Regarding current architecture here are two options. When customer bought the saved model he need to:
The docs explains how to use CRUD for API, what the API means here?
https://github.com/MallCloud/contracts-api/wiki/api-access#create-api