Clinical-Genomics / housekeeper

File data orchestrator
MIT License
2 stars 0 forks source link

Refactor long methods and parameters #105

Closed seallard closed 1 month ago

seallard commented 1 year ago

It is common with functions of the following pattern throughout housekeeper:

    def files(
        self, *, bundle: str = None, tags: List[str] = None, version: int = None, path: str = None
    ) -> Iterable[File]:
        """Fetch files from the store."""
        query = self.File.query
        if bundle:
            LOG.info(f"Fetching files from bundle {bundle}")
            query = query.join(self.File.version, self.Version.bundle).filter(
                self.Bundle.name == bundle
            )

        if tags:
            formatted_tags = ",".join(tags)
            LOG.info(f"Fetching files with tags in [{formatted_tags}]")
            # require records to match ALL tags
            query = (
                query.join(self.File.tags)
                .filter(self.Tag.name.in_(tags))
                .group_by(File.id)
                .having(sqlalchemy_func.count(Tag.name) == len(tags))
            )

        if version:
            LOG.info(f"Fetching files from version {version}")
            query = query.join(self.File.version).filter(self.Version.id == version)

        if path:
            LOG.info(f"Fetching files with path {path}")
            query = query.filter_by(path=path)

        return query

This pattern of using multiple if statements to check whether different arguments are passed or not is a bad idea. It makes the code harder to test, read, modify and use. It violates the single responsibility principle and causes the following code smells:

Vince-janv commented 1 month ago

Closing due to inactivity. Reopen and answer the question below if you want this prioritised.

Concerning the proposed feature: