colonelpanic8 / okcupyd

A Library that enables programmatic interaction with okcupid.com, using okcupid.com's private okcupid JSON API and html scraping when necessary.
MIT License
110 stars 18 forks source link

Support new okcupid.com json API #63

Closed dubiousjim closed 8 years ago

dubiousjim commented 9 years ago

Two days or so ago, search requests were working fine. Then all of a sudden they changed. Now:

profiles = u.search(age_min=33,age_max=33,height_min=60,height_max=70,radius=10,last_online='month',order_by='match')
for p in profiles:
    pass

produces a long exception trace ending with:

.../site-packages/okcupyd/search.pyc in fetch(self, start_at, count)
    279                                          params=search_parameters)
    280         try:
--> 281             search_html = response.json()['html']
    282         except:
    283             log.warning(simplejson.dumps({'failure': response.content}))

.../site-packages/requests/models.pyc in json(self, **kwargs)
    791                     # used.
    792                     pass
--> 793         return json.loads(self.text, **kwargs)
    794 
    795     @property

.../site-packages/simplejson/__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, **kw)
    503             parse_constant is None and object_pairs_hook is None
    504             and not use_decimal and not kw):
--> 505         return _default_decoder.decode(s)
    506     if cls is None:
    507         cls = JSONDecoder

.../site-packages/simplejson/decoder.pyc in decode(self, s, _w, _PY3)
    368         if _PY3 and isinstance(s, binary_type):
    369             s = s.decode(self.encoding)
--> 370         obj, end = self.raw_decode(s)
    371         end = _w(s, end).end()
    372         if end != len(s):

.../site-packages/simplejson/decoder.pyc in raw_decode(self, s, idx, _w, _PY3)
    398             elif ord0 == 0xef and s[idx:idx + 3] == '\xef\xbb\xbf':
    399                 idx += 3
--> 400         return self.scan_once(s, idx=_w(s, idx).end())

JSONDecodeError: Expecting value: line 3 column 1 (char 3)

I haven't been banned; other sorts of requests from the same account and IP still work fine. The normal web interface (accessed in a browser) has started looking different too.

colonelpanic8 commented 9 years ago

Seems like the same issue as #61 . It seems to happen for some accounts and not others.

Do you have any other accounts? Can you verify whether this is happening for those or not?

waffler commented 9 years ago

I'm also getting this; I do also have the new search interface.

colonelpanic8 commented 9 years ago

Hmmm. Seems like they are slowly disabling the private json api for certain segments of users. Other than that I don't really have any guesses.

waffler commented 9 years ago

I dug into this a little more and have something approximating a fix.

Replace search.py line 281 with:

search_html = response.text #WAS: search_html = response.json()['html']

This prevents it throwing a horrible stacktrace on search -- but profile.attractiveness now seems to be None in all cases.

I tried the &filter7 querystring earlier today and it doesn't seem to be doing anything either. Search requests through the browser interface seem to be sending urlencoded JSON in the querystring to specify search parameters, but without A-List I don't know what the attractiveness keys are or if adding them in the request of a non-A-List account would work.

edit:/ My bad, the above makes checking profiles work, but User.search() is returning an exhausted SearchFetchable. This appears to be because of the change to JSON-encoded query parameters as described above.

colonelpanic8 commented 9 years ago

I dug into this a little more and have something approximating a fix.

Replace search.py line 281 with:

search_html = response.text #WAS: search_html = response.json()['html']

lol. looks like they realized that it was super dumb to send html inside of json.

Still, the way I would have done it would have been to send json data and then have the client assemble it into appropriate html. Some of the engineering descisions at okc just leave me shaking my head...

This prevents it throwing a horrible stacktrace on search -- but profile.attractiveness now seems to be None in all cases.

That has already been the case for non alist profiles for quite some time (ever since I made a chrome a chrome extention https://github.com/IvanMalison/alist-query that allowed you to use alist search features). Attractiveness relies on alist search...

I tried the &filter7 querystring earlier today and it doesn't seem to be doing anything either. Search requests through the browser interface seem to be sending urlencoded JSON in the querystring to specify search parameters, but without A-List I don't know what the attractiveness keys are or if adding them in the request of a non-A-List account would work.

lol. urlencoded json in a query string. You have to give it to them, they are creative...

edit:/ My bad, the above makes checking profiles work, but User.search() is returning an exhausted SearchFetchable. This appears to be because of the change to JSON-encoded query parameters as described above.

hmmm. I havent looked at what the suggested fix actually does, but it may just be preventing the traceback because the loaded page IS html. That doesnt mean that it is the html that we are after (I would suggest actually enabling the okcupyd.search logger and looking at it), and if it weren't, that would result in exactly the behavior you describe, because the lxml queries would return no html subtrees that look like profile match cards.

My guess is that they are either a) disabling their private json API b) they changed the uri or the relevant query parameter somehow.

colonelpanic8 commented 9 years ago

Looking in to this today. I'll let you guys know what I find.

magbar commented 9 years ago

Search still returns usable JSON info for users in DeferredJS.addBundles(), inside a bunch of js You can use beautifulsoup to extract the relevant JSON for parsing There is a section at the end called 'paging' that is used for controlling the cursor in the search, it looks like

colonelpanic8 commented 9 years ago

Search still returns usable JSON info for users in DeferredJS.addBundles(), inside a bunch of js You can use beautifulsoup to extract the relevant JSON for parsing

Yeah its probably that the uri/format/params changed. I'm sure its easy to fix, I've just been super busy recently.

colonelpanic8 commented 9 years ago

@magbar @waffler @mosesmc52 @dubiousjim @sagittarian Okay I'm actually looking in to this today.

colonelpanic8 commented 9 years ago

So it looks like okcupid has finally done the sane thing and moved to a real json api. The good news is that this is much easier to deal with than the html they were sending along before! The bad news is that this means pretty significant code changes need to be made, as they have also changed the way that the api is queried.

Here is a sample json response:

{
  "total_matches": 1000,
  "data": [
    {
      "enemy": 0,
      "relative": 50,
      "last_login": 1437897822,
      "gender": 2,
      "location": {
        "country_code": "US",
        "city_name": "Rockville",
        "country_name": "United States",
        "state_name": "Maryland",
        "state_code": "MD"
      },
      "userid": "4327853413343092636",
      "match": 7854,
      "gender_tags": [],
      "liked": false,
      "state_code": "MD",
      "orientation": 3,
      "country_name": "United States",
      "photo": {
        "full_paths": {
          "large": "http://k3.okccdn.com/php/load_okc_image.php/images/0x0/640x560/0x0/0x0/0/966038773518043951.webp?v=2",
          "small": "http://k2.okccdn.com/php/load_okc_image.php/images/0x0/60x60/0x0/0x0/0/966038773518043951.webp?v=2",
          "medium": "http://k2.okccdn.com/php/load_okc_image.php/images/0x0/120x120/0x0/0x0/0/966038773518043951.webp?v=2",
          "original": "http://k2.okccdn.com/php/load_okc_image.php/images/0x0/0x0/0/966038773518043951.webp?v=2"
        },
        "base_path": "0x0/0x0/2/966038773518043951.webp?v=2",
        "original_size": {
          "width": 480,
          "height": 480
        },
        "ordinal": 0,
        "id": "966038773518043951",
        "crop_rect": {
          "height": 0,
          "y": 0,
          "width": 0,
          "x": 0
        },
        "caption": "",
        "thumb_paths": {
          "large": "http://k0.okccdn.com/php/load_okc_image.php/images/640x560/640x560/0x0/0x0/2/966038773518043951.webp?v=2",
          "desktop_match": "http://k0.okccdn.com/php/load_okc_image.php/images/400x400/400x400/0x0/0x0/2/966038773518043951.webp?v=2",
          "small": "http://k0.okccdn.com/php/load_okc_image.php/images/60x60/60x60/0x0/0x0/2/966038773518043951.webp?v=2",
          "medium": "http://k0.okccdn.com/php/load_okc_image.php/images/120x120/120x120/0x0/0x0/2/966038773518043951.webp?v=2"
        }
      },
      "state_name": "Maryland",
      "age": 19,
      "country_code": "US",
      "friend": 4571,
      "is_online": 0,
      "username": "Natasha13131313",
      "city_name": "Rockville",
      "stoplight_color": "red",
      "last_contact_time": [
        0,
        0
      ],
      "orientation_tags": []
    },
    {
      "enemy": 0,
      "relative": 50,
      "last_login": 1437901916,
      "gender": 2,
      "location": {
        "country_code": "US",
        "city_name": "Halethorpe",
        "country_name": "United States",
        "state_name": "Maryland",
        "state_code": "MD"
      },
      "userid": "12909475848387001416",
      "match": 7596,
      "gender_tags": [],
      "liked": false,
      "state_code": "MD",
      "orientation": 1,
      "country_name": "United States",
      "photo": {
        "full_paths": {
          "large": "http://k3.okccdn.com/php/load_okc_image.php/images/0x0/640x560/202x63/603x464/0/14623610972542897148.webp?v=2",
          "small": "http://k2.okccdn.com/php/load_okc_image.php/images/0x0/60x60/202x63/603x464/0/14623610972542897148.webp?v=2",
          "medium": "http://k2.okccdn.com/php/load_okc_image.php/images/0x0/120x120/202x63/603x464/0/14623610972542897148.webp?v=2",
          "original": "http://k2.okccdn.com/php/load_okc_image.php/images/202x63/603x464/0/14623610972542897148.webp?v=2"
        },
        "base_path": "202x63/603x464/2/14623610972542897148.webp?v=2",
        "original_size": {
          "width": 640,
          "height": 640
        },
        "ordinal": 0,
        "id": "14623610972542897148",
        "crop_rect": {
          "height": 401,
          "y": 63,
          "width": 401,
          "x": 202
        },
        "caption": "",
        "thumb_paths": {
          "large": "http://k0.okccdn.com/php/load_okc_image.php/images/640x560/640x560/202x63/603x464/2/14623610972542897148.webp?v=2",
          "desktop_match": "http://k0.okccdn.com/php/load_okc_image.php/images/400x400/400x400/202x63/603x464/2/14623610972542897148.webp?v=2",
          "small": "http://k0.okccdn.com/php/load_okc_image.php/images/60x60/60x60/202x63/603x464/2/14623610972542897148.webp?v=2",
          "medium": "http://k0.okccdn.com/php/load_okc_image.php/images/120x120/120x120/202x63/603x464/2/14623610972542897148.webp?v=2"
        }
      },
      "state_name": "Maryland",
      "age": 28,
      "country_code": "US",
      "friend": 6000,
      "is_online": 1,
      "username": "ohyesttop",
      "city_name": "Halethorpe",
      "stoplight_color": "red",
      "last_contact_time": [
        0,
        0
      ],
      "orientation_tags": []
    }
  ],
  "paging": {
    "cursors": {
      "before": "OTAyNjcxODk5NTQ0MjYyMDE2NywwLDE4",
      "current": "OTAyNjcxODk5NTQ0MjYyMDE2NywxOCwxOA==",
      "after": "OTAyNjcxODk5NTQ0MjYyMDE2NywzNiwxOA=="
    }
  }
}

Requests look like:

{
  "order_by": "MATCH",
  "debug": 0,
  "gentation": [
    34
  ],
  "gender_tags": null,
  "orientation_tags": null,
  "minimum_age": 18,
  "maximum_age": 99,
  "locid": 4357892,
  "radius": 25,
  "lquery": "",
  "location": {
    "popularity": 0,
    "longitude": -7702879,
    "metro_area": 8840,
    "city_name": "Washington",
    "locid": 4357892,
    "latitude": 3889940,
    "display_state": 1,
    "state_code": "DC",
    "default_radius": 100,
    "density": 0,
    "postal_code": "",
    "country_code": "US"
  },
  "located_anywhere": 0,
  "last_login": 604800,
  "i_want": "women",
  "they_want": "men",
  "minimum_height": 15494,
  "maximum_height": 17272,
  "languages": 0,
  "speaks_my_language": 0,
  "ethnicity": [],
  "religion": [],
  "availability": "single",
  "monogamy": "unknown",
  "looking_for": [],
  "smoking": [],
  "drinking": [],
  "drugs": [],
  "answers": [],
  "education": [],
  "children": [],
  "interest_ids": [],
  "username": "",
  "tagOrder": [
    "availability",
    "minimum_height",
    "maximum_height"
  ],
  "after": "OTAyNjcxODk5NTQ0MjYyMDE2NywxOCwxOA==",
  "limit": 18
}

The uri has also changed from match to search

dubiousjim commented 9 years ago

Well, a pain to fix but it's good that the target is so clean and still accessible. Any idea why this was only affecting some accounts and not others?

colonelpanic8 commented 9 years ago

Maybe some type of A/B testing, where certain users are only allowed to use the new api?

mosesmc52 commented 9 years ago

That was my first thought as well.

dubiousjim commented 9 years ago

Right, so will a fix have to continue to implement the current search interface as well as the one they're gradually replacing it with?

colonelpanic8 commented 9 years ago

@dubiousjim Not necessarily. The only thing we've seen so far is the old interface being disabled (not the new one). I'll do some investigation when I implement the feature, but it is possible that we will have to maintain both interfaces.

colonelpanic8 commented 9 years ago

I would really appreciate some help writing all of the filters all over again. I've built the framework for doing this again, so it really should just be a matter of seeing where stuff has changed and putting it into the new (very declarative framework):

I'm putting everything in https://github.com/IvanMalison/okcupyd/blob/master/okcupyd/search_filters.py for now (its not quite going to work yet, but the faster we write the filters the quicker we can get there).

waffler commented 9 years ago

If you can document what the Filter interface looks like (what transform is supposed to take and return, and what goes into acceptable_values) plus what filters need writing, I might be able to help with this at some point.

There are so many levels of indirection in how the filters are built and used that it's hard for me to wrap my head around what's actually going on.

colonelpanic8 commented 9 years ago

If you can document what the Filter interface looks like (what transform is supposed to take and return, and what goes into acceptable_values)

Take a look at: http://okcupyd.readthedocs.org/en/latest/core.html#okcupyd.filter.Filters.register_filter_builder

but keep in mind that: a) I just made some major tweaks to that class so it tells the story in a sort of strange way b) I just made a new, better (in my opinion) interface to all of this

The better interface that I am talking about is the declarative one used by the GentationFilter in search_filters.py

basically it is the same as what is documented for register_filter_builder except that everything that is a keyword argument for register_filter_builder is either a class level variable or a function on the declared class. and the function parameter is the transform function on the class.

THE ONE REALLY weird thing about that class is that functions that are declared on it WILL NOT be methods. This is just a random fanciness to allow introspection to work and to avoid taking pointless class/instance variables. If you don't understand what this is talking about just imagine that transform is just a normal function that is passed to register_filter_builder.

ALSO: feel free to use the register_filter_builder interface if that feels more comfortable to you.

I will probably add a docstring for the declarative interface in the coming week that explains this in a clearer/more polished way.

acceptable_values, type and description are all only there to help auto-generate the docstring. If you don't want to think about them for now that is fine. Most of the filters might just literally be a transform function. There may be a few that take multiple arguments and need an output key and stuff, but you can just do the simple ones if you want

plus what filters need writing, I might be able to help with this at some point.

All of the filters that are declared in search.py need to be completely rewritten. Looking at the json above can give you a hint about what filters need to be added. you'll have to make requests and look at the network that happens to determine exactly what the format of the parameters passed from client to server are.

There are so many levels of indirection in how the filters are built and used that it's hard for me to wrap my head around what's actually going on.

Haha yeah, I probably got a little carried away with the fanciness in a lot of okcupyd. I kind of use it to test out random ideas and code patterns that I have kicking around in my head.

I do think that the interface to the filters is pretty nice if you can wrap your head around it. It ends up being a lot nicer this way that if you have all the filters in one function on a class it has no business being on cough https://github.com/evfredericksen/pyokc/blob/master/pyokc/pyokc.py#L133 cough but I guess that type of thing may be easier to understand for newcomers. Seriously though, that pyokc code was total garbage.

colonelpanic8 commented 9 years ago

@waffler if you find the filter stuff too confusing, you could also try writing a class that parses the new result json.

I like to keep classes and functions in okcupyd pretty small, so writing literally a function/class that takes

{
      "enemy": 0,
      "relative": 50,
      "last_login": 1437897822,
      "gender": 2,
      "location": {
        "country_code": "US",
        "city_name": "Rockville",
        "country_name": "United States",
        "state_name": "Maryland",
        "state_code": "MD"
      },
      "userid": "4327853413343092636",
      "match": 7854,
      "gender_tags": [],
      "liked": false,
      "state_code": "MD",
      "orientation": 3,
      "country_name": "United States",
      "photo": {
        "full_paths": {
          "large": "http://k3.okccdn.com/php/load_okc_image.php/images/0x0/640x560/0x0/0x0/0/966038773518043951.webp?v=2",
          "small": "http://k2.okccdn.com/php/load_okc_image.php/images/0x0/60x60/0x0/0x0/0/966038773518043951.webp?v=2",
          "medium": "http://k2.okccdn.com/php/load_okc_image.php/images/0x0/120x120/0x0/0x0/0/966038773518043951.webp?v=2",
          "original": "http://k2.okccdn.com/php/load_okc_image.php/images/0x0/0x0/0/966038773518043951.webp?v=2"
        },
        "base_path": "0x0/0x0/2/966038773518043951.webp?v=2",
        "original_size": {
          "width": 480,
          "height": 480
        },
        "ordinal": 0,
        "id": "966038773518043951",
        "crop_rect": {
          "height": 0,
          "y": 0,
          "width": 0,
          "x": 0
        },
        "caption": "",
        "thumb_paths": {
          "large": "http://k0.okccdn.com/php/load_okc_image.php/images/640x560/640x560/0x0/0x0/2/966038773518043951.webp?v=2",
          "desktop_match": "http://k0.okccdn.com/php/load_okc_image.php/images/400x400/400x400/0x0/0x0/2/966038773518043951.webp?v=2",
          "small": "http://k0.okccdn.com/php/load_okc_image.php/images/60x60/60x60/0x0/0x0/2/966038773518043951.webp?v=2",
          "medium": "http://k0.okccdn.com/php/load_okc_image.php/images/120x120/120x120/0x0/0x0/2/966038773518043951.webp?v=2"
        }
      },
      "state_name": "Maryland",
      "age": 19,
      "country_code": "US",
      "friend": 4571,
      "is_online": 0,
      "username": "Natasha13131313",
      "city_name": "Rockville",
      "stoplight_color": "red",
      "last_contact_time": [
        0,
        0
      ],
      "orientation_tags": []
    }

and spits out a Profile object might be a good place to start.

With all of that said, there is more labor to be done with the filter stuff. And your help would be most appreciated there.

colonelpanic8 commented 9 years ago

@dubiousjim sorry to call you out specifically but any interest in helping?

dubiousjim commented 9 years ago

Sure, I will try to contribute.

dubiousjim@gmail.com

colonelpanic8 commented 9 years ago

@waffler Another point of clarification: There are a bunch of existing filter functions that take in an argument, translate that argument, and then output it along with a prefixed number. Here is an example:

def get_language_query(language):
    return '22,{0}'.format(language_map[language.lower()])

The prefix number (22) is basically the way that okcupid used to identify certain filters as query params. Because they actually all used to come in as the params '?filter1=22,451&filter2=...' With the order being unimportant. Now they've moved to a json api, so it seems that language just needs to be passed in (remapped as it already is to a number by this function), and an output key added. Basically, the language filter probably should look something like this now:

class LanguageFilter(search_filters.filter_class):

    def transform(language):
        return language_map[language.lower()]

    output_key = "languages"
    acceptable_values = list(magicnumbers.language_map.keys())

Others might have changed in slightly more significant ways. Take max height and min height for example. We used to have to deal with both in one function because there was a single filter for both

def get_height_filter(height_min=None, height_max=None):
    min_int = 0
    max_int = 99999
    if isinstance(height_min, six.string_types):
        min_int = 100 * parse_height_string(height_min)
    elif isinstance(height_min, int):
        min_int = 100 * inches_to_centimeters(height_min)
    if isinstance(height_max, six.string_types):
        max_int = 100 * parse_height_string(height_max)
    elif isinstance(height_max, int):
        max_int = 100 * inches_to_centimeters(height_max)
    return '10,{0},{1}'.format(min_int, max_int)

search_filters.register_filter_builder(
    magicnumbers.get_height_filter,
    descriptions=["The minimum height of returned search results.",

                  "The maximum height of returned search results."],
    acceptable_values=[["A height int in inches",
                       "An imperial height string e.g. 5'4\"",
                       "A metric height string e.g. 1.54m"] for _ in range(2)],
    decider=search_filters.any_not_none_decider
)

Now they can probably be broken up so that each one looks like:

class MinHeightFilter(search_filters.filter_class):
    output_key = 'minimum_height'
    def transform(height_min):
        min_int = 0
        if isinstance(height_min, six.string_types):
            min_int = 100 * parse_height_string(height_min)
        elif isinstance(height_min, int):
            min_int = 100 * inches_to_centimeters(height_min)
       return min_int

I'm just ascertaining all of this by looking at the request json. Let me know if you have any questions about how any of this is determined.

dubiousjim commented 9 years ago

Have looked over the docs for Core Obects/Filter, the changeset "start refactor of filters...", and the above. I have some understanding of what's going on, though like waffler, parts are still fuzzy for me. Here are some questions that might help, if they're easy to answer. Perhaps I'll stumble upon the answers to some of these myself.

  1. At first, I was wondering, wtf is package six? But I looked it up: http://pythonhosted.org/six/
  2. The changeset "start refactor of filters..." added the parm output_key to filters, and the docs for it currently say "Will default to the only value in keys if keys has length 1". But the current implementation defaults to keys[0] regardless of the length of keys. Just remarking, that's all.
  3. Why in the class interface for filters do we expect the function to be named decide, though for backwards compatibility register_filter_builder accepts the kwarg decider. Just a matter of taste, or is there something requiring that the class function/method have a different name than what is passed in kwargs?
  4. The role of the decide function is confusing for me, it might help to have this explained more. Maybe as I browse the code for search.py I'll see examples that will clarify. But one thing that confused me was how in filter.py, I think lines 122-3 and 140 after the changeset "start refactor of filters...", decide gets called with just the argument kwargs, whereas other times (and in its docstring), it seems to be called with the arguments:

    the_builders_transform_fun, kwargs, the_builders_keys_list

    Is this because the decide function gets curried and at the mentioned sites it has already been partially applied beforehand?

  5. In the comment above, Ivan has at one place output_key = "languages", and at another place output_key="minimum_height". Now those are just quick off-the-cuff examples, maybe not meant to be taken to be working code. But if they were to be working code, I'd be confused by where those keys come from. I got the impression that output_key had to be one of the items provided in a keys list, which defaults to the argument names of the transform function if not provided. But in the examples above, no keys list is provided, and the transform function doesn't have arguments with those names. This suggests to me I might not yet be understanding something important about how the filters, or output_key, works.

Finally, if I've understood what I've read so far correctly, then a filter class will in general look like this, with ? before a parameter meaning it's optional:

class XYZFilter(search_filters.filter_class):
    def transform(key1, ...):
        return something of the type expected by OKCupid's search API,
        e.g. a list of numbers
    def decide(own_transform_fn, kwargs, own_keys_list):
        return True or False
    ? keys = a list of names to replace the argument names in `transform`,
                  to be used as keys in OKCupid's search API
    ? output_keys = normally will be keys[0], but could be another of the 
                  values in list keys
    ? descriptions = list of docstrings for the arguments of transform (or a 
                  single docstring, if there's only one argument)
    ? types = list of strings naming types for the arguments of transform 
                  (or a single string...)
    ? acceptable_values = list of strings describing acceptable values for 
                  the arguments of transform (or a single string...)
colonelpanic8 commented 9 years ago

Great questions, things are definitely in flux/in a weird state right now, so its understandable that things are confusing to you.

The changeset "start refactor of filters..." added the parm output_key to filters, and the docs for it currently say "Will default to the only value in keys if keys has length 1". But the current implementation defaults to keys[0] regardless of the length of keys. Just remarking, that's all.

Yeah, I guess I was a little sloppy there. If you look above, the actual correct behavior is there but commented out. I think I was experiencing an issues and just forgot to change it back.

Why in the class interface for filters do we expect the function to be named decide, though for backwards compatibility register_filter_builder accepts the kwarg decider. Just a matter of taste, or is there something requiring that the class function/method have a different name than what is passed in kwargs?

yeah, just taste.

The role of the decide function is confusing for me, it might help to have this explained more. Maybe as I browse the code for search.py I'll see examples that will clarify. But one thing that confused me was how in filter.py, I think lines 122-3 and 140 after the changeset "start refactor of filters...", decide gets called with just the argument kwargs, whereas other times (and in its docstring), it seems to be called with the arguments:

the_builders_transform_fun, kwargs, the_builders_keys_list Is this because the decide function gets curried and at the mentioned sites it has already been partially applied beforehand?

decide is there to determine when a particular filter should be active. The main use case for this is when you have a filter that takes more than one argument. A filter could require ALL of its keys to be present or ANY of its keys to be present or some other arbitary condition based on the supplied keys. For the most part, you can use the default value here without worrying about it

The difference in signature is simple a holdover from the legacy system. I didnt wan't to break all the legacy code just yet since it does still work for some people/we might actually need to keep it around if the new stuff isn't supported on certain accounts (I still haven't investigated that at all). All of the new stuff is required to used the one argument format. You can get accepted keys from the and function from the class you are on.

In the comment above, Ivan has at one place output_key = "languages", and at another place output_key="minimum_height". Now those are just quick off-the-cuff examples, maybe not meant to be taken to be working code. But if they were to be working code, I'd be confused by where those keys come from. I got the impression that output_key had to be one of the items provided in a keys list, which defaults to the argument names of the transform function if not provided. But in the examples above, no keys list is provided, and the transform function doesn't have arguments with those names. This suggests to me I might not yet be understanding something important about how the filters, or output_key, works.

output_key does NOT have to be one of keys/arguments of the transform function. It does default to that, but it exists as an option precisely because okcupyd's interface won't exactly mirror all of the naming choices of okcupid.com.

Finally, if I've understood what I've read so far correctly, then a filter class will in general look like this, with ? before a parameter meaning it's optional:

That is mostly correct (except for the previously mentioned misunderstanding about output_key). The one thing that may change is whether the transform and decide methods are automatically decorated as classmethods or staticmethods. They are currently decorated as static methods, but it may make more sense to have them be classmethods so that they can access the attributes on themselves in their bodies.

dubiousjim commented 9 years ago

OK, have read more code and have more questions.

First, Ivan makes the following remark above:

The difference in signature is simple a holdover from the legacy system. I didnt wan't to break all the legacy code just yet since it does still work for some people/we might actually need to keep it around if the new stuff isn't supported on certain accounts (I still haven't investigated that at all).

This confuses me a bit, because it suggests that there is some connection between the move to the class interface for filters and the new search API. But if I understand correctly, we don't yet have any code for accessing the new search API. Ivan, can you clarify what you are envisaging the connection between these to be?

Second, it looks like what's needed to make new filters is to go through all the places in search.py where register_filter_builders is invoked, and in its place supply a subclass of search_filters.filter_class, as Ivan did in https://github.com/IvanMalison/okcupyd/blob/master/okcupyd/search_filters.py with GentationFilter. This looks straightforward enough, and also looks like it could be done just as a transformation of the existing code. I don't see what coupling we should expect there to be between this and the data sent/received via the new JSON search API. Ivan, is your idea that we should leave all the old filters in place, using the data we know has been working so far with the old search API, but also now engage in a project of re-verifying all the magic numbers by inspecting the data sent/received when submitting searches in a web browser, and ensure that the replacement filters built with the new class interface do conform to OKCupid's current encodings?

Third, after we've generated class versions for all the filters, we're still going to need to write replacements for MatchCardExtractor, SearchFetchable, SearchHTMLFetcher in search.py, that use the new JSON API. Plus logic to attempt to use the new search API but to fall back to the old API if that's ever appropriate. Right?

The rest of these observations/questions are just miscellaneous things that really don't belong in this thread, but I'm including them here because I don't want to just ignore them and am too lazy to make separate issues or pull requests for them right now.

I didn't read all the code, and wasn't attempting a systematic code review. These are just things I happened to notice in what I did look at (often looking only very casually).

colonelpanic8 commented 9 years ago

This confuses me a bit, because it suggests that there is some connection between the move to the class interface for filters and the new search API. But if I understand correctly, we don't yet have any code for accessing the new search API. Ivan, can you clarify what you are envisaging the connection between these to be?

Yeah, you are right, there isn't really a significant connection between the two except that as I was looking at the code that needed to be rewritten I wanted to try out this class based approach because I thought the interface would be easy to explain. It's starting to seem like that was a bit of a mistake because of all of the confusion that its causing. I wasn't expecting anyone to probe so deeply into the code, I was just hoping people would take the interface and run with it. That's not to say that I don't appreciate that you are attempting to comprehend all of this -- in fact I think it's great.

Second, it looks like what's needed to make new filters is to go through all the places in search.py where register_filter_builders is invoked, and in its place supply a subclass of search_filters.filter_class, as Ivan did in https://github.com/IvanMalison/okcupyd/blob/master/okcupyd/search_filters.py with GentationFilter.

Yep.

This looks straightforward enough, and also looks like it could be done just as a transformation of the existing code. I don't see what coupling we should expect there to be between this and the data sent/received via the new JSON search API. Ivan, is your idea that we should leave all the old filters in place, using the data we know has been working so far with the old search API, but also now engage in a project of re-verifying all the magic numbers by inspecting the data sent/received when submitting searches in a web browser, and ensure that the replacement filters built with the new class interface do conform to OKCupid's current encodings?

Yep.

Third, after we've generated class versions for all the filters, we're still going to need to write replacements for MatchCardExtractor, SearchFetchable, SearchHTMLFetcher in search.py, that use the new JSON API. Plus logic to attempt to use the new search API but to fall back to the old API if that's ever appropriate. Right?

Yeah. I'm almost done with all of this. I haven't written any logic to fall back to the old api yet.

The rest of these observations/questions are just miscellaneous things that really don't belong in this thread, but I'm including them here because I don't want to just ignore them and am too lazy to make separate issues or pull requests for them right now.

Why are gentation_filter, location_filter, and age_filter located in filter.py? The implementation for most of the other filters is in search.py or magicnumbers,py. Is there some reason for those three to be separated out and put in filter.py?

There was actually a uniformity in the way that search filters are expressed and the way that looking_for stuff works. Those filters are also used in looking_for.py. To be honest I don't really remember the details.

In magicnumbers.py, shouldn't job also include slashed regexes, like ('banking', 'finance', 'banking/finance')? In magicnumbers.py, perhaps we should change the 'ph.d program' regex in education_level to be 'ph.?d.? program' to be more resistant to future alterations to this text

In magicnumbers.py, in the gentation_to_number dictionary, everybody occurs twice (with the same value), also women should have the value 42 not 34, also men: 21 should be added. In helpers.py, function format_last_online should perhaps include an assert(isinstance(last_online, int)) in the case that last_online is not a str. Also, perhaps we should use isinstance(last_online, six.string_types) at the top of the function. The docstring for util.REMap neglects to specify the default optional parameter. The docstring for util.currying.curry is not clear on whether each invocation of a curried function can supply only a single argument, or several (without yet fully applying the function).

An arbitrary number can be passed in each invovation.

The documentation for session.get_current_user_profile is funny. For util/fetchable.py, the docstring for Fetchable makes it seem like there are two optional keyword arguments, one with the key nice_repr and the other with the key kwargs. In truth, the constructor accepts any number of keyword arguments, and we are here just documenting the special behavior associated with nice_repr. The others all get treated in the way now documented under kwargs. For util/fetchable.py, the refresh method on Fetchable: why indeed do we retain self._original_iterable? Wouldn't it be best to just make this a local variable? For tasks/site.py, I'm not sure if or where this file is even used, but perhaps rate_attractive_girls should get a more gender-neutral name...?

haha yeah probably. I've thought about that before actually but I was too lazy to change it.

tasks/ defines invoke tasks (https://github.com/pyinvoke/invoke)

I didn't read all the code, and wasn't attempting a systematic code review. These are just things I happened to notice in what I did look at (often looking only very casually).

Yeah, I'm sure you are right about all of these. Feel free to submit pull requests if you want.

dubiousjim commented 9 years ago

Ok, I forked the repo, and have locally made patches for most of the little things above. (Also for the round of patches I emailed you directly on June 29.) I'll submit pull requests for these sometime.

I've also locally made proto-versions of class implementations of all the old filters. I haven't yet gone through and made sure that the output and keys conform to the new search API. I realized as I was doing this that it actually makes sense to have the new implementations of filters coupled to the new search API, as the old API and the new API expect different outputs. In fact, the interfaces are likely to diveege even more radically than that.

With respect to:

Plus logic to attempt to use the new search API but to fall back to the old API if that's ever appropriate. Right?

Yeah. I'm almost done with all of this. I haven't written any logic to fall back to the old api yet.

One idea is to just make a boolean flag on the Session, governing whether to use the new or old API. We can probably muddle by with setting this manually for the time being.

Question: how do you inspect the network traffic, to pull the JSON request and response you posted above? In previous days I could just use WebDeveloper in Firefox, submit a request and then examine it under the Network tab. But when I adjust the search parameters (I only changed the max_age from the default search parameters), and try to load the new search, all I see in the Network list is a html GET request to http://www.okcupid.com/match, and there are no additional search parameters listed for that request (because it's a GET not a POST). Also a json GET request to http://www.okcupid.com/apitun/clientstats?&access_token=...&statnames=desktop+-+matchsearch+-+page+01.

Also a bunch of other activity that doesn't look relevant (and none of it to www.okcupid.com, though some is to other okcupid.com subdomains).

I don't see any other json requests, and in particular none to www.okcupid.com/search.

So I can't see how to easily extract what json request is getting sent, if this way of poking the site even does manage to generate a json request. I hope there's some easy way to inspect this activity using just Firefox, maybe supplemented with Firebug. I hope I don't have to use wireshark.

I actually tried using wireshark once on my Android okc app, but that app seemed to initiate a secure telnet session and communicate through that, making it impenetrable to my skills.

colonelpanic8 commented 9 years ago

I've also locally made proto-versions of class implementations of all the old filters. I haven't yet gone through and made sure that the output and keys conform to the new search API. I realized as I was doing this that it actually makes sense to have the new implementations of filters coupled to the new search API, as the old API and the new API expect different outputs. In fact, the interfaces are likely to diveege even more radically than that.

Yeah. I think we should probably just have a totally separate Filters instance for the new filters.

With respect to:

Plus logic to attempt to use the new search API but to fall back to the old API if that's ever appropriate. Right? Yeah. I'm almost done with all of this. I haven't written any logic to fall back to the old api yet. One idea is to just make a boolean flag on the Session, governing whether to use the new or old API. We can probably muddle by with setting this manually for the time being.

That's not totally unreasonable, but I was thinking that we would just try except or something like that to start out with.

Question: how do you inspect the network traffic, to pull the JSON request and response you posted above? In previous days I could just use WebDeveloper in Firefox, submit a request and then examine it under the Network tab. But when I adjust the search parameters (I only changed the max_age from the default search parameters), and try to load the new search, all I see in the Network list is a html GET request to http://www.okcupid.com/match, and there are no additional search parameters listed for that request (because it's a GET not a POST). Also a json GET request to http://www.okcupid.com/apitun/clientstats?&access_token=...&statnames=desktop+-+matchsearch+-+page+01.

Also a bunch of other activity that doesn't look relevant (and none of it to www.okcupid.com, though some is to other okcupid.com subdomains).

I use chrome developers tools. There is definitely a lot of noise that you have to sift through.

I don't see any other json requests, and in particular none to www.okcupid.com/search.

That means that your account is on the old api. You'll need an account that has been moved to the new api to test this.

So I can't see how to easily extract what json request is getting sent, if this way of poking the site even does manage to generate a json request.

It definitely does. Even if you are the old match api you should get a json response that has only one key and a bunch of partial html in that key. I haven't really done stuff like this much in firefox, but im sure that there is a way. If you install screenhero or something we can screenshare and I can show you how i look at requests if you want.

I hope there's some easy way to inspect this activity using just Firefox, maybe supplemented with Firebug. I hope I don't have to use wireshark.

I actually tried using wireshark once on my Android okc app, but that app seemed to initiate a secure telnet session and communicate through that, making it impenetrable to my skills.

Yea -- dealing with SSL is not worth the trouble here. Doing it in browser has the advantage that the requests have been encrypted and decrypted before you see them.

dubiousjim commented 9 years ago

Figured out how to sniff traffic. You're right, I needed to use a different account. I'll post a comprehensive survey of the new json request API, at least those parts of it I can access without an A-list account. Some of the search criteria available in the old API are now no longer provided by the web interface; it's possible that they're still reachable by sticking other values into the request, but I can't observe the web interface doing it. So we'd have to experiment. Also, for some of the request types, I think the current class interface for filters is inadequate. As I understand it, the transform function of a filter can take more than one input parameter, but only has the ability to affect a single output key. But for some of the search criteria, multiple keys have to be adjusted in the request in a coordinated way. So we're going to have to expand the filter class to permit this. (Or perhaps subclass it, it's only two search criteria, essentially the gender of the target and their sexual orientation, where this comes into play.)

dubiousjim commented 9 years ago

The basic fields look something like this:

WOMEN who are interested in MEN ages 30-50 located within 10 miles of SAN FRANCISCO, CA online WITHIN THE LAST MONTH

Next there are a bunch of optional fields.

There are also a number of fields that can only be searched from an account with A-list, which I didn't have active. These include:

In the submitted json object on a non-A-list account, the only key that seems relevant to any of these is "answers", which was always given the value [] in my observations.

For all of the above optional fields, like "minimum_height", "languages", and so on, there is a field "tagOrder" in the submitted json object, which contains a list of these keys in the order the web interfaces displays them. Perhaps we can ignore this; we'll have to experiment.

Finally, the json request object contains fields like these:

colonelpanic8 commented 9 years ago

@dubiousjim Awesome. Thanks for doing all of that.

Also, for some of the request types, I think the current class interface for filters is inadequate. As I understand it, the transform function of a filter can take more than one input parameter, but only has the ability to affect a single output key. But for some of the search criteria, multiple keys have to be adjusted in the request in a coordinated way. So we're going to have to expand the filter class to permit this. (Or perhaps subclass it, it's only two search criteria, essentially the gender of the target and their sexual orientation, where this comes into play.)

This is true, but I think that the current interface could actually support this, because you CAN define multiple filter_builders that take the same key.

With that said, I think it might just be better to change the interface as you suggested.

dubiousjim commented 9 years ago

Oh good, maybe just having multiple filters taking overlapping keys as arguments, but each dedicated to its own output key, would be a good solution. I didn't think of that.

colonelpanic8 commented 9 years ago

So I guess we should just get cracking on this then. Do you know how to use vcrpy/use_cassette and rerecord cassettes. We're going to want to do that as we restore functionality.

colonelpanic8 commented 9 years ago

We might need to rewrite some of the tests as well.

dubiousjim commented 9 years ago

Do you know how to use vcrpy/use_cassette and rerecord cassettes.

Nope

colonelpanic8 commented 9 years ago

It's actually really simple. I made invoke tasks to rerecord the requests. rerecording the tests will actually make live requests to okcupid.com using provided credentials. The tests will be executed a second time after that where all of the occurrences of username and password in the requests will be scrubbed. You just need to make sure that both runs of the tests go through so you dont accidentally commit your credentials in the cassettes

colonelpanic8 commented 9 years ago

I finished writing the fetcher and the request builder, although I can't get the tests to pass yet because it need to send some authorization token along with the request.

All that's left now is to rewrite all the filters.

dubiousjim commented 9 years ago

I've drafted the filters, but it's just a sketch at this point, not code I expect to run much less pass tests. Currently am still working on digesting the code in util, filter, *_search. I'm getting it, but all the magic really does make it take some effort. Have some questions below. Am hoping someone else will write tests, I really can't spare enough time to figure out how the test system works.

Questions:

That's all for now, I expect there will be more later.

dubiousjim commented 9 years ago

The above questions were edited/expanded after the initial posting. Here are some more questions, all at least loosely associated with the new file json_search.py.

With respect to the "contentful" arguments, some of these are presented as optional in the web browser interface, but the requests I observed nonetheless provided default values. These include (together with their default values):

The answers argument is presumably related to the ability to match by answers to specific questions, which is only available in A-list, but nonetheless this field is supplied in requests from a non-A-list account. There are other match criteria available only by A-list (attractiveness, bodytype, personality), but there don't seem to be any fields relevant to these in requests from a non-A-list account.

Perhaps some of the above fields can be omitted from requests we send, though we'd have to experiment to determine that, and OKCupid as far as I know has made no commitment to respond consistently to any published API.

There are also "contentful" fields that the web browser presents as mandatory, but some of them nonetheless have default values:

There are also the fields gentation, locid, radius, location, and last_login, which as far as I know don't have default values. As with the fields described above, whether OKCupid will accept and respond appropriately to requests missing these fields is something we'd have to verify through experimentation. The requests submitted from my web browser always explicitly specified these fields.

colonelpanic8 commented 9 years ago

I've drafted the filters, but it's just a sketch at this point, not code I expect to run much less pass tests. Currently am still working on digesting the code in util, filter, *_search. I'm getting it, but all the magic really does make it take some effort.

Haha. I'm sorry that the magic is so confusing. The idea is that it makes the interfaces nice, but it does make the code somewhat complicated. If you have suggestions about how to make the documentation better or make the code easier to understand, I'm all ears.

Have some questions below. Am hoping someone else will write tests, I really can't spare enough time to figure out how the test system works.

That's fine, but you can always write unit tests on the filter instances pretty easily (as in tests/search_filters_test.py).

Questions:

In filters.py, Filters.filter_class is declared as a method (decorated with @util.cached_property) but then accessed later as an attribute. Is this really a clear and good way to initiate this attribute, given that there is only going to be one or two instances of Filters in the whole system? Is there some good reason to initiate the attribute in this way?

cached_property is just a small peice of syntactic sugar that I really like for giving a complicated dynamic definition for a property that should only be defined once per instance. If a normal property were used instead, things like isinstance(obj, filter_instance.filter_class) wouldn't work.

Also in filters.py, why is Filters.filter_meta decorated as @util.cached_property? Isn't this attribute only ever accessed once, in the declaration inside the method Filters.filter_class, which is itself only executed once? Couldn't the filter_meta class be a local of the method Filters.filter_class? I'm not necessarily advocating that, just trying to see if I understand things.

Yes, you are mostly right here, but you could also theoretically used the metaclass directly as in filters_instance.filter_meta('name', [object], **kwargs) or something like that.

The same method/attribute filter_meta returns a class that derives from util.decorate_all(staticmethod). If I understand rightly, this makes new instances of that (meta)class have all its methods automatically promoted to staticmethods, without the programmer needing to (nor should they) using @staticmethod decorations. Given this, it would in fact be awkward to make the decide methods on filters be declared as classmethods instead. Nonetheless, we do explicitly declare and invoke decide as though it were a classmethod, passing it the filter class as its first argument.

If I understand your question correctly, you are basically asking: why is it that decide has the following definition

            def decide(cls, kwargs):
                return all_not_none_decider(cls.transform, kwargs, cls.keys)

even though we applied util.decorate_all(staticmethod).

I understand your confusion, but the reason that this does not apply is that the decoration will only be applied to methods on the class, not methods on the metaclass and the methods that are decorated would belong to instances of the metaclass (which are themselves classes).

To apply this to a class inline you would write the follwing code:

class SomeClass(object):
    __metaclass__ = util.decorate_all(staticmethod)

    def this_will_be_a_static_method(argument, other_argument):
        return argument

To clarify even further, our case looks like this


class SomeClass(object):
    class __metaclass__(util.decorate_all(staticmethod)):
        def this_wont_be_a_static_method(cls):
            return cls

    def this_will_be_a_static_method(argument, other_argument):
        return argument

But notice in filters.py's function _handle_decide, in the case where the builder.decide method doesn't take 3 arguments (that the old interface), we call builder.decide(kwargs). Isn't this a mistake? Don't we need to explicitly pass in builder's class, since builder.decide is a staticmethod not a classmethod? I'd guess it should be builder.decide(builder.class, kwargs), right?

Not exactly. decide is technically an instance method, but since it happens to belong to a metaclass, it is more like a classmethod than anything else.

There are actually tests for all of this that currently pass, so I'm pretty sure that I am correct on this point. Take a look at test_gentation in tests/search_filters_test.py. Things seem to work properly there.

In util/init.py, the class CallableMap never seems to be used anywhere. Is this leftover from the past or preparing for the future?

Probably the latter, but I honestly don't remember. I would guess that I had imagined some use case for this but never ended up needing it or something like that.

Are you trying to audit the entire code base? I would advise against that if you don't like reading magical crazy code. As I mentioned before, I kinda went crazy with trying a bunch of random shit when I wrote this, and relied on testing to make sure it all works.

dubiousjim commented 9 years ago

No, not trying to audit everything. Just the parts I felt I needed to overcome confusion about to write the new filters. There's a lot of interdependencies.

dubiousjim commented 9 years ago

So if we declare new decide functions (these are needed), and perhaps change the old decide functions to the new API (not needed, but might make the code more comprehensible in the future), they should be declared without the cls parameter?

colonelpanic8 commented 9 years ago

In html_search.py there's a function search(session, count=1, **kwargs) that starts everything going. But there doesn't seem to be any parallel in json_search.py. Should there be?

That is actually not used directly by anything except maybe the attractiveness finder, I believe. Everything should really use SearchFetchable.

As far as I can see, we now are creating three instances of class Filters from filters.py: one in html_search.py, one in json_search.py, and one in looking_for.py. Right? The looking_for.py logic will have to be split between the html and json APIs (unless we determine that the html API can just be deleted, then it merely needs to be updated to json-style).

Yeah, we will have to look in to whether or not this part of the private api changed. There are a few other places where I think they might have made additional changes to the private API. I've created #67 to track the process of investigating this.

in json_search.py's SearchHTMLFetcher, the _query_params method sets up some defaults for the request. But I think we may need to set up some more defaults. There are what I'll call the "contentful" defaults, to be described below. And also some "administrative" defaults, which we wouldn't expect the programmer to supply. Beyond the after and limit arguments already given default values, some other "administrative" defaults I observed in the requests sent from my web browser were: debug: 0, username: "", save_search: 1. Also there's order_by: "MATCH", although perhaps we do want programmers to be able to change that, in which case it'd be a "contentful" default, as others listed below. Also there are the keys i_want, they_want, and tagOrder. How these should be set depend on what "contentful" choices the programmer specifies. Perhaps we can omit them from the request without harm; or perhaps not; we'll have to experiment.

Yeah I kind of imagined that this would be the case. I sort of expect that these are actually optional but we should investigate.

With respect to the "contentful" arguments, some of these are presented as optional in the web browser interface, but the requests I observed nonetheless provided default values. These include (together with their default values):

minimum_height: null maximum_height: null languages: 0 speaks_my_language: 0 # though perhaps a more reasonable default for us to use would be 1 ethnicity, religion, smoking, drinking, drugs, education, children, interest_ids, and looking_for: all have default [] availability: "any" monogamy: "unknown" answers: [] The answers argument is presumably related to the ability to match by answers to specific questions, which is only available in A-list, but nonetheless this field is supplied in requests from a non-A-list account. There are other match criteria available only by A-list (attractiveness, bodytype, personality), but there don't seem to be any fields relevant to these in requests from a non-A-list account.

Perhaps some of the above fields can be omitted from requests we send, though we'd have to experiment to determine that, and OKCupid as far as I know has made no commitment to respond consistently to any published API.

There are also "contentful" fields that the web browser presents as mandatory, but some of them nonetheless have default values:

gender_tags: null orientation_tags: null minimum_age: 18 maximum_age: 99 lquery: "" located_anywhere: 0 There are also the fields gentation, locid, radius, location, and last_login, which as far as I know don't have default values. As with the fields described above, whether OKCupid will accept and respond appropriately to requests missing these fields is something we'd have to verify through experimentation. The requests submitted from my web browser always explicitly specified these fields.

yeah I'd just use POSTMAN (or if you use emacs like me restclient.el) to investigate this by basically making raw curl commands.

dubiousjim commented 9 years ago

Will it work to have a filter say, e.g.:

class SomethingFilter(search_filters.filter_class):
    ...
    decide = search_filters.any_not_none_decider

or will that get screwed up by the implicit @staticmethod decoration in the filter, getting applied to any_not_none_decider which has already been so decorated?

colonelpanic8 commented 9 years ago

So if we declare new decide functions (these are needed), and perhaps change the old decide functions to the new API (not needed, but might make the code more comprehensible in the future), they should be declared without the cls parameter?

As the code is currently written, yes. You can still explicitly make things classmethods if you want (if you needed access some of the other parameters in either transform or decide. We'll need to change the way we inspect signatures to do this but that should be easy. The following test illustrates this.

def test_classmethods_still_classmethods_with_staticmethod_decorate_all():
    class Test(six.with_metaclass(util.decorate_all(staticmethod))):
        @classmethod
        def a_classmethod(cls):
            return 2

        def test():
            return 1

    assert Test.test() == 1
    assert Test.a_classmethod() == 2
colonelpanic8 commented 9 years ago

Will it work to have a filter say, e.g.:

class SomethingFilter(search_filters.filter_class): ... decide = search_filters.any_not_none_decider or will that get screwed up by the implicit @staticmethod decoration in the filter, getting applied to any_not_none_decider which has already been so decorated?

That would work (because search_filters.any_not_none_decider is actually a staticmethod type of the filters instance, not the something filter) except that the build method assumes that all deciders are modernized (i.e. don't take three arguments but just one). Since the decider would need access to the keys of that class, there should probably be some static definition of a classmethod version of the anynot none decider:

@classmethod
def any_not_none_decider(cls, kwargs):
    return any(kwargs.get(key) is not None for key in cls.accepted_keys)

class SomethingFilter(search_filters.filter_class):
    decider = any_not_none_decider
colonelpanic8 commented 9 years ago

@dubiousjim does the util.decorate_all subtlety make sense to you now?

dubiousjim commented 9 years ago

Thanks Ivan, I grok decorate_all itself, and this is helping me with the decide methods, which are some of the trickiest parts of the code to master (and I do need to invoke non-default ones, and define some new ones). So would it make sense for us to declare all the factory blah_decider methods on class Filters as @classmethods? Would it also make sense for us to modify Filters._handle_decide in filter.py to look like this:

def _handle_decide(self, builder, kwargs):
    if len(inspect.getargspec(builder.decide).args) == 3:
        return builder.decide(builder.transform, kwargs, builder.keys)
    else:
        return builder.decide(kwargs)

Formerly the clauses were reversed, and the test was for whether the argspec(...).args == 2. (Why was it 2, when decide is a staticmethod? is this because of some metaclass voodoo?)

Thanks for answering all the questions above. It helped a lot. Some things are still fuzzy, but this should be enough for me to at least push some code to my fork. Any further misunderstandings on my part will probably show themselves when you audit it.

colonelpanic8 commented 9 years ago

Thanks Ivan, I grok decorate_all itself, and this is helping me with the decide methods, which are some of the trickiest parts of the code to master (and I do need to invoke non-default ones, and define some new ones). So would it make sense for us to declare all the factory blah_decider methods on class Filters as @classmethods?

They can't be declared on Filters as classmethods, because when you access them like

Filters.any_not_none or filters_instance.any_not_none they will already be bound to the filters class at that point. We want the cls variable to get bound to the Filter (not Filters) instance, so we need the classmethod decorated function to still be in raw descriptor form.

I know you arent the biggest fan of cached_properties, but that would be one way to handle this. Alternatively we could write a simple wrapper descriptor decorator that would prevent these things from getting decorated, or we could just define them at the top level of the filter.py file. All of these are strange in their own way. I havent decided which one is the least bad yet.

Would it also make sense for us to modify Filters._handle_decide in filter.py to look like this:

def _handle_decide(self, builder, kwargs): if len(inspect.getargspec(builder.decide).args) == 3: return builder.decide(builder.transform, kwargs, builder.keys) else: return builder.decide(kwargs) Formerly the clauses were reversed, and the test was for whether the argspec(...).args == 2. (Why was it 2, when decide is a staticmethod? is this because of some metaclass voodoo?)

Hmmm. yeah good point. This is where the classmethod/staticmethod thing gets confusing. The reason the check was two does have to do with the metaclass magic. I think the real answer is just to get rid of all of the legacy deciders.

We may need to write a function that wraps getargspec and doesnt return the bound parameter...

Thanks for answering all the questions above. It helped a lot. Some things are still fuzzy, but this should be enough for me to at least push some code to my fork. Any further misunderstandings on my part will probably show themselves when you audit it.

It seems like you have a pretty good grasp of things now. Let me see if I can get search actually working, so we can start testing the filters.

dubiousjim commented 9 years ago

OK, have a look at https://github.com/dubiousjim/okcupyd/commits/filters