cloudant / python-cloudant

A Python library for Cloudant and CouchDB
Apache License 2.0
163 stars 55 forks source link

halves memory usage Result iterator side #439

Closed aogier closed 5 years ago

aogier commented 5 years ago

Checklist

Description

Working on #437 I've noticed this one that is trivial but effectively halves memory utilization during iteration. As we release ram before yield we let users exploit their memory for doing whatever they would (I've seen people/comments in the wild speaking about documents > hundred MB in size).

We could elaborate further implementing a FIFO on the iterator three lines below, eg. with a deque that pops while iterating but this one alone already effectively halves ram.

Schema & API Changes

Security and Privacy

Testing

Monitoring and Logging

smithsz commented 5 years ago

I think really we should be streaming the HTTP request using stream=True (see docs):

r_session.get(url, headers=headers, params=f_params, stream=True)

Then, we'd read the data like this:

    @staticmethod
    def __iter_rows(response):
        for line in response.iter_lines():
            line = line.decode('utf-8')
            if line.startswith('{"id":'):
                yield json.loads(line.rstrip(','))

    def __iter__(self):
       ...

        skip = 0
        while True:
            response = self._ref(
                limit=self._page_size,
                skip=skip,
                stream=True,
                **self.options
            )

            skip += self._page_size
            for row in self.__iter_rows(response):
                yield row

That would avoid reading the entire response into memory. One for another day though!

aogier commented 5 years ago

that would be great, also I like the parsing trick. The only issue I think of is when a relatively long operation takes place on received items: could the request timeout during download? It should be investigated (at least by me, since I don't know that). That's a nice idea anyway I'd like to have the feature in some of my jobs!