FOCA allows some modification of the API specs it processes. The modified specs are then written back to the location where the original specs were found, with some filename modification (in fact, for reasons of consistency, files are created even if the specs aren't modified at all).

This pattern creates issues for deployment, because the directory where spec files reside need to be writable. It might be better to (1) avoid storing the modified specs on disk at all, if possible, or (2) store them at a writable location like $TMPDIR.

Hi @uniqueg, I think we can proceed with 2nd method you suggested. Maybe some part of the code from register_openapi.py has to be changed. Instead of:

try:
            with open(spec.path_out, 'w') as out_file:  # type: ignore
                yaml.safe_dump(spec_parsed, out_file)

we can use

try:
            root_dir = os.path.dirname(os.path.abspath(spec.path_out))
            modified_dir = os.path.join(root_dir, 'modified_specs')
            os.makedirs(modified_dir, exist_ok=True)
            file_name = os.path.basename(spec.path_out)
            file_path = os.path.join(modified_dir, file_name)
            with open(file_path, 'w') as out_file:  # type: ignore
                yaml.safe_dump(spec_parsed, out_file)

or similar code, which we can test. One more thing is that we will have to change tests also. Let me know if I understood the problem correctly or not.

Thanks @Rahuljagwani. A few thoughts:

You are proposing to add code that adds a subdirectory modified_specs into the output file name specified by specs.path_out.
The code you propose looks fine, but I think this could be more elegantly solved with the help of the built-in pathlib module.
More importantly, the approach you suggest does not address the problem: The location will still need to be made available for writing explicitly in the Dockerfile for any application built based on the FOCA archetype. See, e.g., here
To avoid that problem, we could either try to remove the need to write out the modified specs at all (by keeping them in memory and feed them to Connexion as an object, if that is possible) or solve this issue by setting a generic TMPDIR inside the FOCA Dockerfile (with the ENV instruction so that it is propagated to the app's container image inheriting from the FOCA container), e.g., to a directory /tmp, and then create and make that directory accessible for reading and writing for everyone (with the appropriate RUN instructions). Then, in the code, we can set specs.path_out to a file inside the directory that TMPDIR points to (we can keep the base name of the original specs). And we would need to catch the case where, for some reason, TMPDIR isn't set and probably document somewhere that TMPDIR needs to be defined and writable (in case people don't make use of the FOCA image or manually unset or modify TMPDIR).

Does this make more sense now?

I got your point @uniqueg.

The first 2 points are completely clear. patlib can be used to improve code.
Understood the problem which was not addressed by me.
For the solution part, I understood somewhat that a temporary directory has to be initiated in Dockerfile using ENV. I am mentioning sample code to be added(if it makes sense) during the builder image execution phase of the Dockerfile.
```
# Set a generic TMPDIR
ENV TMPDIR=/tmp
```

Copy local code and data into the builder stage

WORKDIR /app COPY tmp_data/ $TMPDIR

Set Permissions

RUN chmod 1777 $TMPDIR


- Finally, in the code, we have to set `specs.path_out` to a file in a directory(`tmp_data` mentioned in the Dockerfile that is present in the root folder) that is pointed by `TMPDIR` of the Dockerfile.

**Few questions**
- Do we have to make sure that `tmp_data` is always present in the root folder??

> We would need to catch the case where, for some reason, TMPDIR isn't set and probably document somewhere that TMPDIR needs to be defined and writable (in case people don't make use of the FOCA image or manually unset or modify TMPDIR).
- To handle the above case you mentioned, will updating code only in [register_openapi.py](https://github.com/elixir-cloud-aai/foca/blob/dev/foca/api/register_openapi.py#L91) be sufficient?
- I am also a little confused about how this pointing thing will work. Will it work normally like how 'COPY` works in DOCKER?

Please correct me at places where I am wrong. Also apologies for not understanding the problem correctly or solution is misunderstood by me :)

Hi @Rahuljagwani:

I don't know in detail how the COPY and ENV instructions work in Dockerfiles. Your code snipped looks good at first glance, but I guess the best is to try it out.
I'm not sure whether we need to create a directory tmp_dir inside /tmp - it feels kinda redundant. However, in case we end up using /tmp for other files, it might be useful to keep a little order in there. But perhaps pick a more descriptive name for a subdirectory then, maybe specs/ or, if you wanna keep it more generic, app/.
As for whether the directory (/tmp/tmp_dir or whatever you end up calling it) should always be present: I don't think we can ensure that it will always be there. If someone intentionally or inadvertently remove it, manually or through some piece of code, it will be gone - we can't fully guard against that. I think if we make sure that the directory is created, readable/writable and $TMPDIR is set, I guess we have done our due diligence. What I meant by my comment is that in the app, we should put any code that accesses $TMPDIR in a try block, and catch I/O errors with an explicit error message before shutting down the service (because I suppose we can't really recover from being unable to write out the modified specs).
Indeed I believe that changing the Dockerfile and the code in register_openapi.py should be sufficient. And of course the corresponding unit tests must be adapted (and tests added for situations where $TMPDIR is unset, points to an unavailable directory or points to a directory that is not readable or writable). I think that should be it, but I may be forgetting something.
Not really sure what you mean by "this pointing thing". In pinciple, what we need is set an environment variable. That is pretty standard practice. You create a pathlib.Path out of it, construct your filename and try writing. Something like this:
```
import os
from pathlib import Path

specs_mod = ... # put code to construct filename of modified specs file
spec.path_out = Path(os.environ["TMPDIR"]) / "specs" / specs_mod

try:
# put code to write/touch file
catch IOError as exc:
# put code here to exit app with a critical, descriptive error message
```
You could also use os.environ.get("TMPDIR", "/tmp") instead, which _might_ work even if $TMPDIR is not set, assuming/tmp/specs` is available and writable.

Hope this helps :)

Thanks a lot, @uniqueg for clarification. Actually, after carefully reading what you wrote in the previous comment, I changed the following code:

New Dockerfile- (Added just 2 lines)

#...prev code
# Install Python dependencies
COPY requirements.txt ./
RUN pip install \
        --no-warn-script-location \
        --prefix="/install" \
        -r requirements.txt

ENV TMPDIR=specs
RUN mkdir -p ${TMPDIR} && chmod 1777 ${TMPDIR}

# Install FOCA
COPY setup.py README.md ./
COPY foca/ ./foca/
RUN pip install . \
        --no-warn-script-location \
        --prefix="/install"
#...code continues

As you mentioned I kept specs as a more reasonable name for the TMPDIR, and a simple mkdir command is executed. I successfully built an image from this Dockerfile locally.

Change in register_openapi.py-

# Write modified specs
        try:
            output_path = Path(spec.path_out)
            root_dir = output_path.parent
            file_name = output_path.name
            temp_dir = Path(os.environ.get('TMPDIR', 'specs'))
            modified_dir = root_dir / temp_dir
            modified_dir.mkdir(parents=True, exist_ok=True)
            spec.path_out = modified_dir / file_name
            with open(spec.path_out, 'w') as out_file:  # type: ignore
                yaml.safe_dump(spec_parsed, out_file)
        except OSError as e:
            raise OSError(
                "Modified specification could not be written to file "
                f"'{spec.path_out}'"
            ) from e
        except yaml.YAMLError as e:
            raise yaml.YAMLError(
                "Could not encode modified specification"
            ) from e
        logger.debug(f"Wrote specs to file: {spec.path_out}")

The above code forms a new directory named specs at the parent directory of spec.path_out. Everything is working fine except for one problem that I am facing which is a unit test error: Test

I think it is because when I run tests, a new directory is formed inside test_files named does/not/specs/exists.yaml, but the test expects that when path_out does not exist, an exception should be there because of simple try-catch block in register_openapi.py which throws an OsException when executed unlike my code which makes a directory named specs/ at the location of path_out.

What would be your call about this? If you want I can make a demo PR for more clarity on the code, but that particular unit test will fail.

Very nice - thanks a lot @Rahuljagwani!

As for the test: We use automated testing to guard against code regression. In particular, we want to make sure that, when changing the codebase, the app still behaves as expected. However, if we actually want the behavior of our app to change (like in this case), unit or integration tests may well be expected to fail. In such cases, we should update the tests to reflect the new expected behavior, as well as add new tests, if applicable.

So please prepare the PR according to the changes you made, but be sure to update (or remove) the failed test, and instead provide new tests for the new behavior (see my previous comment on what conditions to test for).

Some advice on this: At the very least, we should have all the new statements in the code covered by tests. However, code coverage in itself is not the main thing to consider. Instead, we should reason about our code and think of the common use cases, as well as any common edge cases, and then write tests to account for these. If we do that thoroughly, all code statements will naturally be covered by one or more tests in the process.

In case you have problems writing the tests: I suggest you do as much as you can and raise a PR. It's much easier to comment and discuss in a code review than here in this issue :)

elixir-cloud-aai / foca

Rethink storage of modified specs #158

Copy local code and data into the builder stage

Set Permissions