SubhasisDutta / Wireframe-Identification-Engine

This app is a concept demonstrator to autonatically convert a wireframe design image to user interface code. By identifying the different user interface controls using various machine learning techniques and converting them to usable metadata which can be used on platform like SAP's BUILD to generate freestyle prototype. Using this app a user can build training models by collecting data and constantly improve the classifier through reinforcement learning.
http://35.160.238.107:6060/
Apache License 2.0
1 stars 0 forks source link

Create a post for build team #7

Closed SubhasisDutta closed 8 years ago

SubhasisDutta commented 8 years ago

Below is a summery of the discussion me, arturo, gabby,wibin and chats had during a few meetings we had regarding this topic.

Similar solution available

A recently published Patent on Wireframe Recognition and Analytics Engine (WRAE) (https://www.google.com/patents/US20140068553 ) tries to address this problem.

Demo : https://www.youtube.com/watch?v=kXGTPrFUbUg

The solution suggested in brief has three steps:

  1. Input acceptance and feature identification The image provided is run through various computer vision process like Canny Edge Detector and Optical Character Recognition using Open Source packages like OpenCV (http://www.willowgarage.com/pages/software/opencv) OCRopus (https://github.com/tmbdev/ocropy ) Different information regarding the position, size and enclosing text are extracted.
  2. Identification of Wireframe Components Uses predefined conditional decision rules which are traversed and the rule that is satisfied for a particular component are identified. The approach taken by the inventor will not scale and is rigid. It tries to address the process of identification of components using predefined conditions for different element. This limits the scope of use of this application to a very small and specific set of wire frames which comply with the decision rules defined in the recognition engine. Thus limiting use to a very few set of controls. Also decision based identification will require precise drawing.
  3. Generation of Source code from identified tags

Possible Approach discussed

figure1

Workflow:

  1. The user uploads a set of wire frame into a prototype project in BUILD.
  2. For each wireframe uploaded an image page is created and asset identified with (Project Id, Page Id and Asset Id).
  3. After the image is saved in database, a background process is initiated with the asset identifiers to the wireframe analysis engine.
  4. The entire image is analyzed(using ML technique we will use for identifying controls in step 8) to identify the suitable device layout (Desktop, Tablet, Phone). Then the corresponding gray-scale image with noise reduction in fixed resolution (1280 X 1024) is generated. This gray-scale image is provided as input to next three steps processed in parallel.
  5. The image is analyzed using Optical Character Recognition system to identify the text and their position in the image. Identified following open source solution OCRopus (https://github.com/tmbdev/ocropy) Tesseract (https://github.com/tesseract-ocr/tesseract ) which can be modified to get the desired output.
  6. The image is analyzed to break the wireframe into small segments each containing a User Component. Identified two image processing methods canny edge detection with contour identification http://docs.opencv.org/trunk/da/d22/tutorial_py_canny.html http://docs.opencv.org/trunk/d4/d73/tutorial_py_contours_begin.html watershed method http://opencv24-python-tutorials.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_watershed/py_watershed.html Another possibility as divesh suggested could be we ask the user to manually mark the different objects in the wireframe through hot-spot like feature.
  7. As the image is broken down different property associated with a component like position in the page, width, height etc. are deducted and passed to the metadata generator.
  8. After the components are separated they are passed through an identifier to identify the type of the component. Identified two open source projects that are used for image recognition using machine learning techniques like neural networks. a. Tensor flow - https://github.com/tensorflow/tensorflow (https://www.tensorflow.org/) b. Caffe - https://github.com/BVLC/caffe (http://caffe.berkeleyvision.org/)
  9. The metadata generator receives all data and combines them to create a BSON object in pageMetadata format and persist in MongoDB.
  10. After the metadata is made available in database a user gets the option to add the generated page in the prototype editor of BUILD in meatball menu, and using the create page flow we can add a new object page into the project.

In my opinion it will be better to start processing for all image page when uploaded. And only enable the option to generate hi-fidelity page if we are able to generate metadata for an image page. As the entire process is going to be time consuming (probably more than 1 minute) and it will be bad if we make the user wait for so long and then come up with nothing.