eloidieme / DD2480_Coverage

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

refactoring plan for _get_serialized_fields #9

Closed eloidieme closed 8 months ago

eloidieme commented 8 months ago

Is the high complexity you identified really necessary? Is it possible to split up the code (in the five complex functions you have identified) into smaller units to reduce complexity? If so, how would you go about this? Document your plan.

eloidieme commented 8 months ago

Step 1 : identify functional blocks

In the _get_serialized_fields function, the two main functional blocks are:

eloidieme commented 8 months ago

Step 2 : simplify the function using helper methods

Having identified the functional blocks, we can break down the function into more manageable chunks that each deal with a different aspect of the functionality. Furthermore, it improves separation of concerns.

eloidieme commented 8 months ago

Step 3 : update the _get_serialized_fields function

Now we can make use of the helper methods in the main method to achieve the same functionality but with a less complex function. It would now look like this:

def _get_serialized_fields(self, item, default_value=None, include_empty=None):
    item = ItemAdapter(item)
    include_empty = self.export_empty_fields if include_empty is None else include_empty
    field_iter = self.determine_field_iter(item, include_empty)
    for field_name in field_iter:
        yield self.process_field(item, field_name, default_value)

It is much shorter and the complexity is now reduced to 3 (count by hand).

eloidieme commented 8 months ago

Step 4 : update unit tests

We would now need to update the test suite to cover the newly created methods individually, in order to ensure that each unit works as expected. Then we would conduct thorough testing to ensure that the refactored code behaves identically to the original code.

eloidieme commented 8 months ago

Step 5 : documentation and code review

Finally, we would need to document the refactoring (like I am doing right now), explaining why it was done and how, but also include relevant docstrings and comments in the newly created functions. Then, before merging the new version of the code, it should go through peer review as usual.

eloidieme commented 8 months ago

First helper function:

def _determine_field_iter(self, item, include_empty):
    if self.fields_to_export is None:
        return item.field_names() if include_empty else item.keys()
    elif isinstance(self.fields_to_export, Mapping):
        return self.fields_to_export.items() if include_empty else (
            (x, y) for x, y in self.fields_to_export.items() if x in item
        )
    else:
        return self.fields_to_export if include_empty else (
            x for x in self.fields_to_export if x in item
        )

Second helper function:

def _process_field(self, item, field_name, default_value):
    if isinstance(field_name, str):
        item_field, output_field = field_name, field_name
    else:
        item_field, output_field = field_name

    if item_field in item:
        field_meta = item.get_field_meta(item_field)
        value = self.serialize_field(field_meta, output_field, item[item_field])
    else:
        value = default_value

    return output_field, value