elixir-cloud-aai / foca

Opinionated Flask microservice archetype for quick OpenAPI-based microservice development
Apache License 2.0
16 stars 11 forks source link

Rethink storage of modified specs #158

Closed uniqueg closed 11 months ago

uniqueg commented 2 years ago

FOCA allows some modification of the API specs it processes. The modified specs are then written back to the location where the original specs were found, with some filename modification (in fact, for reasons of consistency, files are created even if the specs aren't modified at all).

This pattern creates issues for deployment, because the directory where spec files reside need to be writable. It might be better to (1) avoid storing the modified specs on disk at all, if possible, or (2) store them at a writable location like $TMPDIR.

Rahuljagwani commented 11 months ago

Hi @uniqueg, I think we can proceed with 2nd method you suggested. Maybe some part of the code from register_openapi.py has to be changed. Instead of:

try:
            with open(spec.path_out, 'w') as out_file:  # type: ignore
                yaml.safe_dump(spec_parsed, out_file)

we can use

try:
            root_dir = os.path.dirname(os.path.abspath(spec.path_out))
            modified_dir = os.path.join(root_dir, 'modified_specs')
            os.makedirs(modified_dir, exist_ok=True)
            file_name = os.path.basename(spec.path_out)
            file_path = os.path.join(modified_dir, file_name)
            with open(file_path, 'w') as out_file:  # type: ignore
                yaml.safe_dump(spec_parsed, out_file)

or similar code, which we can test. One more thing is that we will have to change tests also. Let me know if I understood the problem correctly or not.

uniqueg commented 11 months ago

Thanks @Rahuljagwani. A few thoughts:

Does this make more sense now?

Rahuljagwani commented 11 months ago

I got your point @uniqueg.

Copy local code and data into the builder stage

WORKDIR /app COPY tmp_data/ $TMPDIR

Set Permissions

RUN chmod 1777 $TMPDIR


- Finally, in the code, we have to set `specs.path_out` to a file in a directory(`tmp_data` mentioned in the Dockerfile that is present in the root folder) that is pointed by `TMPDIR` of the Dockerfile.

**Few questions**
- Do we have to make sure that `tmp_data` is always present in the root folder??

> We would need to catch the case where, for some reason, TMPDIR isn't set and probably document somewhere that TMPDIR needs to be defined and writable (in case people don't make use of the FOCA image or manually unset or modify TMPDIR).
- To handle the above case you mentioned, will updating code only in [register_openapi.py](https://github.com/elixir-cloud-aai/foca/blob/dev/foca/api/register_openapi.py#L91) be sufficient?
- I am also a little confused about how this pointing thing will work. Will it work normally like how 'COPY` works in DOCKER?

Please correct me at places where I am wrong. Also apologies for not understanding the problem correctly or solution is misunderstood by me :)
uniqueg commented 11 months ago

Hi @Rahuljagwani:

Hope this helps :)

Rahuljagwani commented 11 months ago

Thanks a lot, @uniqueg for clarification. Actually, after carefully reading what you wrote in the previous comment, I changed the following code:

New Dockerfile- (Added just 2 lines)

#...prev code
# Install Python dependencies
COPY requirements.txt ./
RUN pip install \
        --no-warn-script-location \
        --prefix="/install" \
        -r requirements.txt

ENV TMPDIR=specs
RUN mkdir -p ${TMPDIR} && chmod 1777 ${TMPDIR}

# Install FOCA
COPY setup.py README.md ./
COPY foca/ ./foca/
RUN pip install . \
        --no-warn-script-location \
        --prefix="/install"
#...code continues

As you mentioned I kept specs as a more reasonable name for the TMPDIR, and a simple mkdir command is executed. I successfully built an image from this Dockerfile locally.

Change in register_openapi.py-

# Write modified specs
        try:
            output_path = Path(spec.path_out)
            root_dir = output_path.parent
            file_name = output_path.name
            temp_dir = Path(os.environ.get('TMPDIR', 'specs'))
            modified_dir = root_dir / temp_dir
            modified_dir.mkdir(parents=True, exist_ok=True)
            spec.path_out = modified_dir / file_name
            with open(spec.path_out, 'w') as out_file:  # type: ignore
                yaml.safe_dump(spec_parsed, out_file)
        except OSError as e:
            raise OSError(
                "Modified specification could not be written to file "
                f"'{spec.path_out}'"
            ) from e
        except yaml.YAMLError as e:
            raise yaml.YAMLError(
                "Could not encode modified specification"
            ) from e
        logger.debug(f"Wrote specs to file: {spec.path_out}")

The above code forms a new directory named specs at the parent directory of spec.path_out. Everything is working fine except for one problem that I am facing which is a unit test error: Test

I think it is because when I run tests, a new directory is formed inside test_files named does/not/specs/exists.yaml, but the test expects that when path_out does not exist, an exception should be there because of simple try-catch block in register_openapi.py which throws an OsException when executed unlike my code which makes a directory named specs/ at the location of path_out.

What would be your call about this? If you want I can make a demo PR for more clarity on the code, but that particular unit test will fail.

uniqueg commented 11 months ago

Very nice - thanks a lot @Rahuljagwani!

As for the test: We use automated testing to guard against code regression. In particular, we want to make sure that, when changing the codebase, the app still behaves as expected. However, if we actually want the behavior of our app to change (like in this case), unit or integration tests may well be expected to fail. In such cases, we should update the tests to reflect the new expected behavior, as well as add new tests, if applicable.

So please prepare the PR according to the changes you made, but be sure to update (or remove) the failed test, and instead provide new tests for the new behavior (see my previous comment on what conditions to test for).

Some advice on this: At the very least, we should have all the new statements in the code covered by tests. However, code coverage in itself is not the main thing to consider. Instead, we should reason about our code and think of the common use cases, as well as any common edge cases, and then write tests to account for these. If we do that thoroughly, all code statements will naturally be covered by one or more tests in the process.

In case you have problems writing the tests: I suggest you do as much as you can and raise a PR. It's much easier to comment and discuss in a code review than here in this issue :)