compdemocracy / polis

:milky_way: Open Source AI for large scale open ended feedback
https://pol.is
GNU Affero General Public License v3.0
779 stars 183 forks source link

Document the data used to generate the visualization #487

Open patcon opened 4 years ago

patcon commented 4 years ago

The pca key of the participationInit endpoint is a giant escaped JSON string of all the data that renders to client-participation visualization. (This data is sometimes referred to in codebase as "preload data" for a conversation.)

It comes from the math_main database table, where there's a data column with this JSON string. The math worker populates this field with all its processed data for a conversation, and then the server later sends it to client when initializing or updating a convo.

We should document the structure of this data, since it's important to understand if someone wishes to work on the visualization.

To Do Candidates

patcon commented 4 years ago

Here are some rough notes from a conversation with @metasoarous about the pca data and it's origins in math subcomponent:

  • pca key in conversation response obj, preload data
    • pca key is escaped string of json
  • looked to R data frame paradigm: pandas.
    • only works if whole library works that way. so bad design pattern outside pandas.
  • organized by column
  • everything resigned around arrays in R
  • one object is vector of one frame
  • structures can have row names
  • original idea of team polis: "let's copy that!" (assuming we're going to have big data scale for pol.is).
  • chose to use zippable arrays instead of list of objects. but data compression means this doesn't really matter
  • "dict of arrays" instead of "array of dicts"