benjaffe / chrome-okc-plugin

OkCupid Poly Plugin
MIT License
62 stars 32 forks source link

Update question format #12

Closed benjaffe closed 10 years ago

benjaffe commented 10 years ago

@lukemadera @instantgreen @cowboy

Starting a new thread for this discussion. I'd like to revamp the way questions are stored. Here's the original

{
    "qid":"1128",
    "text":"Would you date someone who was already in a committed relationship with someone else?",
    "category": "poly",
    "wrongAnswers":["Yes, even in secret.","No, it's wrong.","No, but I don't think it's inherently wrong."]
}

-- and my first pass at a new one, where the questions are in order, the category number array refers to the question number, 1 means perfect match, 0 is imperfect match, and -1 means not applicable, trash the question for that category --

{
    "qid":"1128",
    "text":"Would you date someone who was already in a committed relationship with someone else?",
    "answers": ["Yes, even in secret.", "Yes, but only if everybody knew", "No, but I don't think it's inherently wrong.", "No, it's wrong."]
    "categories": {
        "non-monogamous": [1, 1, 0.3, 0]
        "non-monogamous-open": [-1, -1, 0.6, 0]
        "communication": [0, 1, -1, -1]
    }
}

Thoughts? Improvements? It'd be now easier to connect broad questions with categories, it un-binary-izes the plugin, and isn't too big. If they change the order of the answers for some reason, they're stored with the questions in our data file.

Anyone have any suggestions for improvements? I want to do this right, so it's flexible enough for the future and we won't have to change it again anytime soon. :)

cowboy commented 10 years ago

FWIW, you can fence your code with triple-backticks and a language abbreviation (js, css, json, html, etc) like this:

// code goes here

To render it like this:

{
  "foo": "bar"
}
benjaffe commented 10 years ago

Thanks @cowboy!

lukemadera commented 10 years ago

@benjaffe good first pass and I like the idea of "doing it right" this time. So I'm not sure I understand exactly what you mean by with the category number arrays - each one refers to a particular question answer? And we'd display a percent match for each category rather than a number of questions "correct"?

Either way, I really like the idea of weighting answers (this is kind of what I was trying to do by separating "poly" and "polyOpen" but weighting is much cleaner) and I'd recommend a 1 to -1 scale where +1 means perfect match, -1 means bad match and 0 means neutral (so you're -1 would become a 0), i.e.

{
  "qid":"1128",
  "text":"Would you date someone who was already in a committed relationship with someone else?",
  "answers": ["Yes, even in secret.", "Yes, but only if everybody knew", "No, but I don't think it's inherently wrong.", "No, it's wrong."]
  "categories": {
    "non-monogamous": [0.5, 1, -0.1, -1]
    "communication": [-1, 1, 0, 0]
  }
}

That would allow a full spectrum and remove the need for "in-between" categories and allows for question and answer differentiation (which I think is what you're going for). It also prevents duplicate questions being stored in multiple categories by including the categories within each question.

In terms of structure, I think we need to base any decision off:

ease and robustness of user input

I'd like to see the ability for the user (i.e. via options.html) to be able to enter one of:

  1. array of question id's (this gives complete flexibility and customization since they pick exactly the quesetions they want and aren't limited to our pre-selected ones BUT they are limited to our pre-selected categories and weighting for that question..)
  2. array of categories
  3. array (or just one) "super category" - this would be like preset groups of categories (i.e. "default" would be what you've selected currently in defaultQuestions.js line 26)

performance

robustness / ease of coding / flexibility

The last two are related - putting categories inside questions prevents duplicating questions but "hardcodes" the categories and weights - maybe that's okay but for "true flexibility" what matters is BOTH what questions are selected AND what categories and weights are set for that particular question - different users will care about different things and we don't necessarily want to include EVERY possible combination as that would be bad for performance.

Realistically the way I see it being used is that certain people will create presets and then should be able to share those. Then the majority of users will just type an array of categories or a "preset" that is a grouping of questions/categories. While allowing JSON uploading does allow full flexibility, I'd rather just use crowd sourced presets - they can create their questions and categories/weightings and submit them as a preset.

But I still don't know the best way to organize it and handle all the combinations.. Typical schema design question - do we nest categories in questions OR questions in categories OR keep them both separate and link them together as needed?

Just "thinking out loud".. thoughts?

benjaffe commented 10 years ago

Forgot to say: The downside to this method that I see, btw, is that it'd be harder to manually skim the questions list for all the communication questions, as an example. I think that's an okay sacrifice though for the benefits.

Yes, the 0th number corrosponds to the 0th answer, the 1th to the 1th, etc. I'm not sure how to display it yet, but I have a lot of ideas. !!

I like the 1 to -1, but we also need a way to say not applicable... maybe NaN.

"communication": [-1, 1, NaN, NaN]

Re: user input, I also want users to be able to make their own JSON which will get merged in, or even specify a URL to a JSON file of their own. That will override any of our question choices, giving the hardcore users a way of specifying anything they want. Plus, they can make up their own category names. Maybe there could even be a way, down the road, of disabling our question set entirely.

I'm not as concerned about performance when choosing the data structure since the http requests are at least an order of magnitude slower than everything else, and they have the most variablility. Of course we'll optimize where we can though.

I'm thinking that it's more typing and more file size to have duplicate questions, since they each have question text and answers, and we want the number arrays to be near the answer list, so it's easy to make choices. I think my schema, but adapted with numbers between -1 and 1, and NaN, is my current preference.

This'll mean a bunch of work looking up questions to get the database in order. In case you don't know, you can access a question by its qid with this url: http://www.okcupid.com/questions?rqid=1128

instantgreen commented 10 years ago

@benjaffe

I haven't gotten through all of the new questions, yet. This is what I can say so far:

The category 'communication' probably should be called 'cheater'. Somebody who violates the trust of people he has intimate sexual or non-sexual relations with is just that. I have to admit though, that not every relationship whatsoever is as serious for every partner to be kept in the loop, i.e. having multiple booty calls. I just wanted to raise awareness, that it's not just bad communication.

I dislike the unaggressive category because the questions really don't imply that somebody is aggressive towards others. If there was a question like 'Have you punched somebody in the face in the last year?' this would be different. But even then you would probably have to ask, whether it really wasn't in self defense and if it wasn't, whether they got so angry that they punched them. This is just difficult and people probably would lie anyway.

The only question that makes sense to me is the one about damaging property. But then this category should be called civilized.

The 'quietly angry' question I would put in the 'happy' category because again, it doesn't predict whether somebody really acts on there feelings. Only acts are punishable, not feelings or thoughts.

I don't know why you commented the question about 'worrying about things that you have no control over' out. I think it's a pretty good indicator of how carefree and therefore happy a person is.

I'd like to talk about those categories now: science, logic and nonReligious.

I don't think that putting all the quizz-questions into the category 'logic' is a good idea. Mainly because those questions have nothing to do with logic. This is what a logic question looks like: "Half of all policemen are thieves and half of all policemen are murderers. Does it follow logically that all policemen are criminals?" The questions I formerly grouped under 'intelligent' have almost nothing to do with this. Those questions are more diverse and test for some general knowledge about the world and some brains. I vote for giving them an extra category, ideally called intelligent again, even if it's quite judgemental.

I also think the 'science' category has some questions in it that are quite debatable. I.e. "Did America really put a man on the moon?" This has got nothing to do with science but people who believe in conspiracy theories. I certainly wouldn't date somebody who answers 'No' to this question. But it isn't a science question.

This question doesn't seem to say alot: "Are you annoyed by people who are super logical?" In my opinion there are too many other reasons to say 'Yes'.

I think those two questions: "Should evolution and creationism be taught side-by-side in schools?" and "Do you put more weight in science or faith?" shouldn't be put under 'science' either. Or maybe just the first one. I would much rather like to know how religious a person is than how scientific a person is. I think the 'science' category can be fully dissolved and it's questions can be put in 'intelligent' and 'nonReligious'.

Now to the last category, nonReligious. I think this category actually comprises of three under-categories: religiousness, spirituality and believing in some kind of higher being, mostly called god. You can believe in god without being religious. Therefore nonReligious is a bit incorrect. I don't know how this could be called, because nonAtheist or nonAgnostic are two different thinks again... Maybe somebody can come up with a better name for it. My thought was that we probably don't want extra categories for those three under-categories because it basically seems all the same to us... Correct me if I'm wrong. ;)

I'm not sure as to how much I like the new 0 to 1 answering system. Needs to proof itself in the real world first. ;) I don't know how it is right now, but please keep displaying all the answers to the question that were being asked so I can check what's happening.

Thanks for your work so far! :)

benjaffe commented 10 years ago

Generally, I tend to want to avoid applying labels to people that they wouldn't want. Like "cheater", or "racist". Actually, especially as a white male, I'm sensitive to this around race and gender issues. So I tend to like labeling categories, and users can decide for themselves what it means. "Race issues" instead of "racist". Especially since some of the questions vary so much in what they mean. The example that my partner and I discussed was the question about wanting to marry inside your racial group. That could be because of racism, but it could also be because of parental pressure in traditional family situations. That's why I chose "communication" instead of "cheater". Thoughts?

I agree with you about unaggressive, but I don't want to label people aggressive. There's definitely a better word for it out there, and I haven't given it a lot of thought. I'm up for changing it as long as the new category name is a category and not a label.

'quietly angry', agreed.

worrying: It's in my genes to worry. My grandma worried ALL the time (nobody told her when I'd go on international trips until I was back already), and my Mom worries too. And I feel an inability not to. The difference is whether you worry so much that it affects your happiness. In my grandmother's case, it affected her happiness a lot. In my case, no. I worry, but I'm a very happy person.

Science/logic/religion.... oh boy, yeah, this is a huge can of worms.

I want to avoid being judgemental, so I want to avoid the category name "intelligent".

Agreed about the moon question, but I'd argue that there is a correlation between believing the moon landing was faked and a lack of scientific literacy. Especially since all the arguments for the conspiracy are dismissible with a touch of skepticism, or a decent amount of scientific literacy. I also think skepticism/critical thinking and scientific literacy are generally connected.

So I guess "science" should be "valuesScience". I think down the road, we can make aliases to the short names, so the long names (and maybe even descriptions) show up in the interface for the user, but the names stay short for us question modifiers. :)

The super logical question, yes, I've been thinking of removing that as well. I'll do that now.

I disagree about dissolving the science category. I definitely want people who I date to be scientifically literate, so I'd like that to be measurable easily.

Regarding the nonReligious category, yeah, I'm super unsure about what to do with that. I think with the new question storage schema, we'll be able to have way more categories, with questions covering multiple categories at once. So we could have religious, spiritual, agnostic, atheist, nonReligious, nonSpiritual, etc... Pretty much any category that people want to give me pull requests for! Maybe we'll solve this one by sheer brute force, adding a while bunch of categories for spirituality.

If you want to add categories of questions so you can see more results, run this in the Console while on okcupid.com

okcp = JSON.parse(localStorage.okcp);
okcp.questionCategories = ["unaggressive","happy","poly","polyOpen","notPossessive","science","cuddling","sexPositive"];
localStorage.okcp = JSON.stringify(okcp)

add or remove strings in the array in the second line to add or remove categories that show up!

Thank you for your thoughts. I really appreciate having other people going over this too.

cowboy commented 10 years ago

Is there any way to simply ignore questions whose answers the logged-in user has categorized as irrelevant? That way, you don't have to guess what they care about. They've already told you.

cowboy commented 10 years ago

In fact, why not forgo all the questions and simply scrape this? That all I care about, anyways.

http://www.okcupid.com/profile/USER/questions?disagree=1

And guess what? It's useful for everyone, not just poly people :D

benjaffe commented 10 years ago

The plugin is not likely to get many irrelevant questions since it loads pages that the plugin user cares about.

And if I only get the disagree page, I won't be able to distinguish between users that I have little in disagreement with, and users that haven't answered many questions. I'd like to see why they're considered poly by the plugin, not just where they might not be poly, for example. Also, if I add the disagree page, then I'll need to filter for questions I find irrelevant, whereas I don't have to with the plugin fetching what it currently fetches.

cowboy commented 10 years ago

You can mark a question as irrelevant. If you do so, that question won't appear in the "disagree" page, unless the potential match marks it as anything other than irrelevant. Which, if they do, you should care about, right?

benjaffe commented 10 years ago

Ah yes, that's true. So ignore my last sentence. ;)

I'll think about scraping the disagree page. Something doesn't seem right about it though, like... users with a lot of disagreeing answers from you, but all low priority, will end up looking disproportionately bad. And users with a few very important disagreeing questions might end up looking better than them.

lukemadera commented 10 years ago

@instantgreen You make some good points about particular questions but to me the whole point is to move away from having to have these discussions and putting the control in the hands of the user - there's no one "correct" answer to any of this anyway so if we can build a system that allows users to select the questions, categories, and weights they want AND be able to leverage work done by others, then that makes discussing particular questions moot - you can make yours look however you think makes most sense and not have it effect or be affected by others.

@cowboy Good idea with the 'disagree' page but I agree with @benjaffe that while it can be an option or feature, it's not comprehensive enough and doesn't save much time from just going to the disagree page itself. I love the plugin as it is now because it allows me to select the questions I care about and quickly show me BOTH positives and negatives for those questions (not just the negatives) - without me having to manually click through 5-20 pages sorted by the ones I care about most.

@benjaffe

benjaffe commented 10 years ago

Hmm, I don't see how one method gives us more merge conflicts than the other. If someone changes how a question applies to a category, won't it either show up as an added question (if storing questions in categories), or a changed line in the question (if storing categories in questions)?

I'm planning to switch to chrome.storage very soon, and we'd want settings and questions synced via their google account if available. We have a max storage of 4096 bytes per synced item with a max total storage of 102400 bytes. So storage size does matter. The relationship_model.js is already at 5519 characters... We could ditch the question text and answer text in each item and just have them in a non-synced answer key of sorts, but that'd make understanding what's going on really hard while reading that file. Unless we built a tool for integrating suggested changes...

We're already at ~25 KB total. We could store the question and answer text in non-synced storage (since that doesn't need to be synced, and can always be fetched), and only store the numbers in synced storage. If we went that route, we could store it either way.

Blarg. Tough choices. If any of you have any other thoughts, let me know. I'll make a choice sometime soon, but I want to let it sit here and marinate for a bit longer. I'm leaning towards keeping categories in questions, even though that makes it harder to edit everything in a single category. We can always build an interface for that.

To sum things up, the question is still whether we duplicate questions and store them in category objects, or add the applicable categories inside each question.

lukemadera commented 10 years ago

Storing by question:

{
  "qid":"111",
  "text":"question text here",
  "answers": ["answer 1", "answer 2", "answer 3", "answer 4"]
  "categories": {
    "cat-1": [0.5, 1, -0.1, -1],
    "cat-2": [-1, 1, 0, NaN]
  }
}

Storing by categories: cat1 file:

[
  {
    "qid":"111",
    "text":"question text here",
    "answersText": ["answer 1", "answer 2", "answer 3", "answer 4"],
    "answers": [0.5, 1, -0.1, -1]
  }
]

cat2 file:

[
  {
    "qid":"111",
    "text":"question text here",
    "answersText": ["answer 1", "answer 2", "answer 3", "answer 4"],
    "answers": [-1, 1, 0, NaN]
  }
]

Adding new category

Now let's say I want to add cat3

By questions:

{
  "qid":"111",
  "text":"question text here",
  "answers": ["answer 1", "answer 2", "answer 3", "answer 4"]
  "categories": {
    "cat-1": [0.5, 1, -0.1, -1],
    "cat-2": [-1, 1, 0, NaN],
    "cat-3": [0, -1, NaN, 0.5]
  }
}

By categories: NEW cat3 file:

[
  {
    "qid":"111",
    "text":"question text here",
    "answersText": ["answer 1", "answer 2", "answer 3", "answer 4"],
    "answers": [0, -1, NaN, 0.5]
  }
]

By questions there's a merge conflict since we're editing an existing file. By categories there's a new file that's added; no merge conflict.

Editing existing category answer weight

Let's say I want to edit cat2's answer weight

By questions:

{
  "qid":"111",
  "text":"question text here",
  "answers": ["answer 1", "answer 2", "answer 3", "answer 4"]
  "categories": {
    "cat-1": [0.5, 1, -0.1, -1],
    "cat-2": [-1, 0, 0, NaN]
  }
}

By categories: cat2 file:

[
  {
    "qid":"111",
    "text":"question text here",
    "answersText": ["answer 1", "answer 2", "answer 3", "answer 4"],
    "answers": [-1, 0, 0, NaN]
  }
]

In this case both scenarios require editing an existing file BUT by categories there's lower chance of multiple merge conflicts since only changes made to THAT category would also take place in that file. The bottom line is that by question we'd store EVERYTHING in ONE file, meaning ANY change to anything will create a merge conflict. Whereas if we store by categories there's more files so less merge conflicts. Storing by question is definitely cleaner since there's no duplication and it's smaller total file size. But it makes that ONE massive file very brittle - there will need to be a new copy for every difference, even if it's just one character. By category, other people can make changes to categories I haven't edited and I can pull in those changes and re-sync and I only have to manually merge any changes to categories I've also edited.

For file size considerations for the categories route, we can pull out the question text and answer text to a master question file (since those won't change) and just store the answer weights and question id in each category file. A little harder to match question weights in the proper order since you'll need to reference the master question file for the order, but it will avoid data duplication other than qid.

Basically, the whole point is to make this customizable meaning we need to plan for and expect MANY changes so we need to make that easy to do and re-sync with "master" or others' work or additions. Changes will come in the form of:

  1. adding new questions
  2. adding new categories
  3. altering answer weights of existing categories

The question is - how do we accommodate that while also minimizing file size/duplication and keeping it clean? Paralleling your comment about performance, I think we should optimize for changes rather than for file size/duplication if we can't figure out a way to do both.

instantgreen commented 10 years ago

@benjaffe Ok, agreed, I tend to get a bit emotional when it comes so abusing someone's trust. Cheater might not be a good name. I'm coming more and more to the conclusion that we might have to ignore questions that are too broad and could mean anything. The clear and upfront questions are really the better ones.

I actually don't know what aggressive would mean to you. OkC measures for it but I failed to identify the corresponding questions. Would it mean that somebody is violent? How do you distinguish people from different cultures? People from Italy would always be aggressive, people from Japan almost never? Again, I very much doubt that people would answer those quesionts truthfully. If those questions actually existed in the first place...

There are more than one question in almost all the categories so being a worrier but having a cheerful outlook in life probably doesn't make you a worrier. ;) Moreover I am quite aware that these questions would only give you a tendency a person has.

How about changing 'intelligent' to 'quizzes'? The problem is that the science questions are mostly knowledge questions. So I still vote for them being integrated in the 'quizzes' category. I'm unsure with the rest of the science questions because I've heard that there are people who are scientists but are religious. ;) I don't know whether the science questions are really dichotomic (science vs. religious).

Ok, agreeing on the moon question.

Thanks for the hint about the console! :)

Is there really a need for syncing? In my situation I only use OkC on my home computer and my smartphone. It's totally sufficient if the plugin only runs on my computer since I'm using the app on my smartphone anyway.

@lukemadera I've been thinking about how much work we want to demand from the users of the plugin. How difficult will it be that they configure their own questions and categories? In the beginning I thought there could have been a simple menu with a lot of checkboxes that enable preconfigured categories but right now it seems to me that it's going to a lot more difficult. I'd just like to point out that the plugin should be usable by an average user, too. Maybe there should be a mode where you can still use preconfigured categories.

benjaffe commented 10 years ago

@lukemadera @instantgreen

@instantgreen

  1. Yeah, I get emotional too. :)
  2. Aggressive was meant as something that would be interesting for people who have triggers around anger, so they could identify how various people handle anger. I haven't been paying too much attention to that one, so it might not be very consistent at this point. You raise good points.
  3. If someone only shares the worrying question with another user, that user will show up with only that question as indicating how happy they are. But if there are 6 matches, yeah it could be useful.
  4. Quizzes is a good interim name, yes, I like it. And yeah, science vs religious is a false dichotomy, but that'll be solved largely by having questions apply to multiple categories, and by users choosing to add notes in particularly sticky situations.
  5. Re: syncing, I figure, why not? I use Chrome all over the place, and it'd be nice to be able to switch computers without worrying about it. Plus, what about when your computer eventually dies and you need to move over? Your data is toast, since nobody's going to think about my little plugin and the data when 1. there's no guide to migration, and 2. they're worrying about bigger things like photoshop, or their music library. :P

@lukemadera So, I've been thinking about this all day. Here are my thoughts:

  1. Yes, definitely there should be preconfigured categories. I'm personally most interested in supporting those users, and once everything is stable and happy, only then will we add any awesome configuration options. Besides, we only have <2k users, so I don't want us to bust our butts working for the 100 or so who want to super-customize. That puts us in the role of "the deciders", deciding what fits each category, but people who disagree will hopefully contact us about it.
  2. You've convinced me. You're right, having duplicate questions makes the most sense since we'll be integrating changes often. The consequences of that choice are mainly that we'll have to store the questions in chrome.storage.sync without the question text or answers text. That shouldn't be too hard. Every page load, it'll grab from chrome.storage.sync and update the question list, stored in chrome.storage.local (the latter will have the q and a text included, which shouldn't be a problem since we have 5 MB storage in .local ). We can even respond to the onChanged event and recalculate the categories, etc. :) If the user updates their question list, we'll strip the text back out and push to .sync .

Profile settings will also be stored in .sync . The profiles cache will be in .local .

  1. I've been thinking about the questions, and I'm thinking there should be another key too... weight. So, for poly, a question that asks specifically about polyamory would have a high weight, and less specific questions would have a lower weight. Those weights should be answer-specific too. Example:
"cat1": [
  {
    "qid":"111",
    "text":"question text here",
    "answerText": ["positive applicable answer", "neutral applicable answer", "negative but not super applicable", "not applicable answer"],
    "answerRelevance": [1, 0, -1, 0],
    "answerApplicability": [1, 1, 0.5, 0]
  }
],
"cat2": [
  {
    "qid":"111",
    "text":"question text here",
    "answerText": ["negative applicable", "neutral semi-applicable", "neutral applicable", "positive applicable"],
    "answerRelevance": [1, 0, 0, 0],
    "answerApplicability": [1, 0.5, 1, 1]
  }
]

Thoughts?

benjaffe commented 10 years ago

...and by the way, the following would get pushed into chrome.storage.sync:

"cat1": [
  {
    "qid":"111",
    "ar": [1, 0, -1, 0],
    "aa": [1, 1, 0.5, 0]
  }
],
"cat2": [
  {
    "qid":"111",
    "ar": [1, 0, 0, 0],
    "aa": [1, 0.5, 1, 1]
  }
]

(but stringified, of course)

lukemadera commented 10 years ago

@benjaffe @instantgreen I definitely want preset categories for easy use and agree we support average users before "power users" (who could just download / fork the plugin as we have if they want uber customization). My goal has always been a simple interface (i.e. options.html) where the user can type in a comma separated list of one or more categories and that's it. And eventually/later that people can submit new question categories (or similar categories but with different questions or answer weights). This would allow for ultimate flexibility but still a simple user interface of typing categories to use.

@benjaffe

{
    "answerText": ["positive applicable answer", "neutral applicable answer", "negative but not super applicable", "not applicable answer"],
    "answerRelevance": [1, 0, -1, 0],
    "answerApplicability": [1, 1, 0.5, 0]
}

would be equivalent to

    "answerText": ["positive applicable answer", "neutral applicable answer", "negative but not super applicable", "not applicable answer"],
    "answers": [1, 0, -0.5, 0]

I agree both should be factored in but this isn't an exact science anyway so I don't see a compelling reason to add in more complexity that will require more explaining (I don't even fully understand what you're getting at so other users likely won't either). I think intuitively people will combine them and that's the point of the 1 to -1 scale. Unless I'm missing something that makes the extra complexity worth it?

Overall though I think we're getting close!

lukemadera commented 10 years ago

@benjaffe So I've been thinking about it some more and I understand why weighting could be useful (I think we should use "score" and "weight" though since "applicable" and "relevance" seem like very similar words to me so that's why I was having a hard time understanding the difference and why we need both). Weighting would prevent a bunch of neutral answers from skewing the score down - i.e. 10 ".2" answers is not the same as 2 "1" answers.

That said, I definitely want to know how many questions have been answered for the category (i.e. a confidence level of sorts) but not necessarily in percentage format. So I guess we need to decide on the algorithm and final display format and then once we have some examples we can hone it further.

Currently though, I still really like the existing system - a simple "x/y" format that shows how many total questions in this category were answered and how many were "correct". Yes it's more simplistic but:

I guess what I'm saying is that while weighting could help reduce categories by combining them (i.e. "poly" vs "polyOpen"), I actually LIKE the separation since it gives me more data to quickly visualize/snapshot - if they're combined into one category with weighting, then I lose the ability to quickly see how many "important / super relevant" questions were answered "correctly" and how many "more neutral" ones were if they're mixed in together. OkCupid already natively combines all this stuff into one final "match percent" anyway so to me the point of this plugin is allowing more differentiation by category. People can group their "very relevant" questions together in one category and their "semi-relevant" ones in another category and essentially achieve the weighting themselves while retaining the ability to quickly differentiate these weights rather than mixing them together into one combined category.

So I guess my question is - while weighting is "more thorough", is it necessary? What specific issue/problem do you have with the current "x/y" system?

The feature I actually would like more than a weighting system is consistent ordering - i.e. the categories always show up in the order they're passed in (right now they move around from top to bottom to middle so you always have to look in a different place to find the same category). Alphabetical sort is an easy solution but I'd prefer a user set order if possible - again whatever order the categories are typed in when they're input - so "cat2, cat1, cat3" would make "cat2" always show up on the top when displayed and "cat3" always on the bottom.

benjaffe commented 10 years ago

I was SO tired when I made up those words. My partner was practicing scales on the guitar, and I was staring bleary eyed at the screen, thinking "What is wrong with me, why can't I think of english words!" And I had dreams last night, culminating in me waking up thinking, "SCORE! Not relevance, silly Ben, score!" True story, actually.

I want to do away with a single indicator. Right now, there's a cross-hatched pattern when only one question in a category has been answered, but I want to do better somehow. Maybe a x/y system, like "5.4 of 6.8", where it adds (((score+1) / 2) * weight) to get the number (resulting in a number between 0 and 1 for each question). The confidence would be 6.8, in the example, calculated by adding the weights of all the questions together. There could be a ? next to the first one, which would pop up an explanation of the scoring system, and a ? next to each category which would pop up the description of the category.

I don't think weighting should reduce categories... I agree. Poly and PolyOpen are separate things to me, despite overlapping questions. But I do think weighting will make things more accurate. There are questions that I'd leave out normally, but if I can only have the question apply with certain answers, and weighted less than clear questions, I'd be inclined to put more questions in. And if the user answers "incorrectly" for the category, their score won't suffer hugely.

For example: "Would you consider being part of a commited polyamorous relationship - ie, three or more people but no sex outside the group?" If they answer yes, that's clear for the non-monogamy category. Weight of 1. If they answer no, that's a no for non-monogamy, but only weighted at 0.2. Just an example I made up.

Category ordering should definitely be set up, and it will currently sort similarly per profile (I added a sorting thing a while back, but it's not great).

benjaffe commented 10 years ago

Maybe if the question doesn't have a weight array, the plugin will assume weight of 1 for all answers. So it can be optional, but I want to have the option.

lukemadera commented 10 years ago

That sounds good @benjaffe - in general I almost always will support more options as long as they're not required and/or have defaults - that way we "both win" in that it can stay simple but people who want to tweak it can do so. And I do agree weighting can definitely be useful and I'd like it too - I just didn't think it was worth the complication. But if we can keep the display simple/as is AND add weighting, I'm all for it.

And yes on category sorting/ordering - awesome!

At this point we may just need to try it and see it in practice and tweak from there? Any other next steps / discussion items or time to code?

benjaffe commented 10 years ago

Time to code! To disable caching (and probably a few other things in the future, run the following in the console:

js``` localStorage.devMode = JSON.stringify(true)


Then run the following to filter for our guinea pig categories:

js```
okcp = JSON.parse(localStorage.okcp);okcp.questionCategories = ["drugs","soft_drugs","hard_drugs","alcohol","420","no_soft_drugs","no_hard_drugs","no_alcohol","no_420"];localStorage.okcp = JSON.stringify(okcp)

We'll deprecate the "drugs" category, and switch up the way the data is stored for those categories, adding the new categories. This way, we can test things out using a category that we haven't "released", and we can switch back and forth by switching localStorage.devMode. (It doesn't matter too much for me since I develop in Chrome Canary, but it's good to have the option).

I'm opening up a Google chat so we can chat about what we're each doing. I'm going to take a stab at it in a few hours, but if you're up and working at the same time, we can collaborate and avoid duplicating each others work and having a mess of conflicts upon merging.

cowboy commented 10 years ago

FWIW, I have started coding from scratch and am experimenting with a totally different approach. Not sure if it's going to make sense WRT what you guys are doing, but I think I'll be able to get it to work.

As you can see, I'm starting by focusing on Unacceptable Answers, eg. specific incompatibilities between the logged-in user and the current profile being displayed according to the "mandatory", "very_important", "somewhat_important" and "little_important" question relevance levels.

Because the script also collects all of the logged-in user's question answers for those four levels, it should be easy to create "categories" to display (like "3/5 sex positive") given per-category lists of question ids, which will allow the current display to be "pivoted" to display via category instead of relevance levels.

I hope to make the source available later this week, but I'm going to be super busy, so it might just be something I hack on here or there until I have more time in mid-November.

A very, very bad match (per-level output truncated to 3 questions)

A much better match

:grinning:

benjaffe commented 10 years ago

@cowboy Neat, nice work. Let me/us know if you put it online.

@lukemadera Here's the first category I'm working on. The first and third questions don't have a weight category, so it'd default to weights of 1 for each answer.

//no_smoking
{
    "qid":"501",
    "text":"Have you smoked a cigarette in the last 6 months?",
    "category": "no_cigarettes",
    "answerText": ["Yes", "No"],
    "score": [-1, 1]
},
{
    "qid":"13006",
    "text":"Would you go out with a smoker?",
    "category": "drugs",
    "answerText": ["Yes", "Yes, but only an occasional/social smoker","No"],
    "score": [-1, -1, 1],
    "weight": [0.7, 0.4, 1]
},
{
    "qid":"80621",
    "text":"How often do you smoke cigars?",
    "category": "drugs",
    "answerText": ["Frequently.", "Occasionally.", "Never."],
    "score": [-1, -1, 1]
},
benjaffe commented 10 years ago

Question schema revamp branch here: https://github.com/benjaffe/chrome-okc-plugin/tree/question-schema-revamp

I think I actually have it working successfully. Now I want to convert a bunch of questions to the new format to test it. Right now, the only categories in the new format are "non_smoker" and "smoker", and they seem to work!

benjaffe commented 10 years ago

All the drugs_smokes.js and veggie_vegan.js questions are converted. Let me know what you think.

benjaffe commented 10 years ago

BTW, I think the way it displays "1.7/3.1" isn't very readable. I'm thinking the background of the little category area could grow like a progress bar, or something similar. Visual, not based on the user reading numbers, but the numbers (maybe rounded, or with the decimal rendering smaller) are still there.

abelr1 commented 10 years ago

I'm not sure if I'm doing something wrong, or if there is an error somewhere, but I'm having issues adding more categories to the displayed questions. In the defaultQuestions.js file, I added "weight" and "love" to the line

storage.questionCategories = storage.questionCategories || ["unaggressive","happy","poly","polyOpen","notPossessive","science","cuddling","sexPositive","weight","love"];

I added this to the discrimination.js file:

{
            "qid":"52682",
            "text":"If one of your potential matches were overweight, would that be a dealbreaker?",
            "category": "weight",
            "wrongAnswers":["Yes, even if they were slightly overweight.","Yes, but only if they were obese."]
},

When I open the profile, the category "Love" does not show up. Weight does, though. "Love" is in the sex.js file and I have confirmed that "cuddling" does work. I have also confirmed that the profile that I was looking at does have 2 of the love questions answered. I have raised the var numQuestionPages to 100 to test. I also changed the cacheEnabled to false. I have also added the fetish category, but it isn't working either. I have tested this on 2 different profiles.

I have also taken "science" out of the defaultQuestions.js file, but that category is still showing up.

Any ideas?

benjaffe commented 10 years ago

@abelr1 If you're using the question-schema-revamp branch, you should be using the data format a few comments above (https://github.com/benjaffe/chrome-okc-plugin/issues/12#issuecomment-26681444)

If you aren't using that branch, the functionality you're looking for might not be implemented, and if it is, it's probably not complete in whatever version you're using. This discussion specifically is for the new question schema, so can you google message me with your question, or better yet, open an issue and mark it as a question.

If you aren't on the experimental branch, I'll ask you to just sit tight for a bit longer. The functionality you want will start to work once this is pushed live, and once it's live, I'll have a lot more brain space to help. :)

edit: You write, "I have also taken "science" out of the defaultQuestions.js file, but that category is still showing up." -- yeah, that's probably cause the functionality isn't in the version you're using. Hopefully that'll be coming very soon!

benjaffe commented 10 years ago

@abelr1 If you find any other questions for the categories, do let me know, either via email, or as an issue with the tag "enhancement". I just added the weight question to the question-schema-revamp branch, and it'll be in the plugin soon. Thank you!!

lukemadera commented 10 years ago

Thanks for the updates @cowboy and @benjaffe I got sick unfortunately and too will be busy for the next few days but hopefully this weekend I can work on stuff some more if there's more to do. Thus far though it seems like good progress @benjaffe One question - while I personally don't really have an issue with it, I thought we were going to only store key info for each question (for space reasons)? I.e. we could extract out question text, answer text, and category text if we wanted to save space? Again I don't have a problem with leaving it in as it makes it clearer and easier but just wanted to make sure we're on the same page - so the only difference from how it was at this point is "score" and "weight" for each answer with "weight" defaulting to "1" if not set - so it's basically backwards compatible with the existing/first version by changing "wrongAnswers" to "answerText" and adding a "score" field for each?

Also, on the note of removing category - that goes along with further file modularization and I'd recommend doing it - i.e. some people may want "coffee" and some may not so I think it's better to have a "coffee" file that's not bulked in with "drugs/smokes". Basically each category has it's own file. We can group them into "super categories" for easy reference but again I think it's important to limit merge conflicts by only having ONE category per file.

benjaffe commented 10 years ago

@lukemadera Sorry you got sick!

I was planning on only storing key info for the questions in chrome.sync (when we implement that to replace localStorage), but the questions file that ships with the app keeps the text and answers, so when we (or anyone else) want to tweak question weights, we don't have to look up the question text or answer choices text.

I don't know what you mean by being backward compatible... I'm planning on entirely ditching the wrongAnswers format just for clarity.

For this version, I was planning on just having the categories I define available, and implementing additional functionality later. You're totally right about removing category, regardless of whether or not we have a separate file for each. I'm working on that right now, and hopefully will get it done tonight.

If every category was its own file, that could lead to potential issues. The main thing I want to avoid is having the user needing to modify manifest.json , and I'm not super into the idea of using a server to host the questions files. At this moment, I can't think of another way to have a bunch of files without adding all of them to manifest.js . Super-categories solves this by keeping the js files finite and known, but you're right, it's much more management. Hmm... actually, I'm starting to be okay with the idea of hosting the questions on my server, but if usage of the plugin skyrockets, that'll cost me a few bucks. Mrph, it's probably fine though. (Sorry, this is my internal monologue...) :)

I was imagining users adding questions in one of two ways. 1. Adding JSON to a field in the plugin, stored ultimately in chrome.storage. 2. Submitting a pull request (or some similar request) with new questions that we could add to the next version.

benjaffe commented 10 years ago

@lukemadera The question format has been updated... I still haven't broken every category into their own files, mainly because I don't know of a sustainable way of doing it aside from hosting files on a server.

Here's the new format, just pushed to the question-schema-revamp branch. (This is an example, multiple categories are in this file.)

fileQuestions.science_spirituality =
    {
        "science_literacy": [
            {
                "qid":"409", //A "shooting star" is a star that...
                "answerText": ["...burned out, and collapsed", "...collided with Earth's atmosphere", "...got sucked into a black hole", "...isn't really a star"],
                "score": [-1, -1, -1, 1],
                "weight": [1, 1, 1, 1]
            },
            {
                "qid":"178", //Which is bigger?
                "answerText": ["The earth", "The sun"],
                "score": [-1, 1],
                "weight": [1, 1]
            }
        ]
    };
benjaffe commented 10 years ago

@lukemadera K, here's what I typed into the console to add my preferences. Feel free to modify this however you like, for whatever categories you prefer.

okcp = JSON.parse(localStorage.okcp);
okcp.questionCategories = ["non_smoker","not_420_friendly","veg_friendly","generally_happy","not_volatile","race_nondiscriminating","LGBT_nondiscriminating","artist","non-monogamous","communicative","not_possessive","not_wanting_children","sex-positive","cuddling","science_literate","science-friendly","non-religious","non-spiritual"];
localStorage.okcp = JSON.stringify(okcp)

It's very clear that I desperately need to fix the sorting, and find a way to display more categories. I'm thinking of sorting by the order of the questionCategories array, but also have an option to sort by highest-ranking/lowest-ranking categories. Although, I think I can push this live now and just have the same categories as the defaults. Only people reading this thread will know how to add/remove categories until we implement the category-choosing functionality. I'll wait to push for a day or two in case anyone here wants to play with it first.

instantgreen commented 10 years ago

Sorry for not responding so quickly, I've got a faulty mainboard...

The questions look fine so far.

The only thought I have right now is this: Not sure about this question: "Do you know what a 'safeword' is, in a sexual context?" Initially I changed the BDSM category to a "Knowledge of BDSM"-category, because I didn't think that pure knowledge would imply so much. If you want to keep it your way, you should add this question to the "dominant" category, too.

I'm excited to try the new plugin! :)

instantgreen commented 10 years ago

Ok, I've just tested the new plugin. Please don't weigh the question about the safeword so highly! If the person only has anwered this one question, they will get a 1.0/1.0 in all the categories that have this question and this is just irritating.

benjaffe commented 10 years ago

That's a really good point, sorry my mistake. I just changed the weight for an answer of "Yes", and also made it so categories with a total weight of <= 0.5 don't get shown at all. That'll be in version 2.3.2