Closed uniqueg closed 11 months ago
Hi @uniqueg, I think we can proceed with 2nd method you suggested. Maybe some part of the code from register_openapi.py has to be changed. Instead of:
try:
with open(spec.path_out, 'w') as out_file: # type: ignore
yaml.safe_dump(spec_parsed, out_file)
we can use
try:
root_dir = os.path.dirname(os.path.abspath(spec.path_out))
modified_dir = os.path.join(root_dir, 'modified_specs')
os.makedirs(modified_dir, exist_ok=True)
file_name = os.path.basename(spec.path_out)
file_path = os.path.join(modified_dir, file_name)
with open(file_path, 'w') as out_file: # type: ignore
yaml.safe_dump(spec_parsed, out_file)
or similar code, which we can test. One more thing is that we will have to change tests also. Let me know if I understood the problem correctly or not.
Thanks @Rahuljagwani. A few thoughts:
modified_specs
into the output file name specified by specs.path_out
.pathlib
module.Dockerfile
for any application built based on the FOCA archetype. See, e.g., hereTMPDIR
inside the FOCA Dockerfile (with the ENV
instruction so that it is propagated to the app's container image inheriting from the FOCA container), e.g., to a directory /tmp
, and then create and make that directory accessible for reading and writing for everyone (with the appropriate RUN
instructions). Then, in the code, we can set specs.path_out
to a file inside the directory that TMPDIR
points to (we can keep the base name of the original specs). And we would need to catch the case where, for some reason, TMPDIR
isn't set and probably document somewhere that TMPDIR
needs to be defined and writable (in case people don't make use of the FOCA image or manually unset or modify TMPDIR
).Does this make more sense now?
I got your point @uniqueg.
patlib
can be used to improve code.ENV
. I am mentioning sample code to be added(if it makes sense) during the builder image execution phase of the Dockerfile.
# Set a generic TMPDIR
ENV TMPDIR=/tmp
WORKDIR /app COPY tmp_data/ $TMPDIR
RUN chmod 1777 $TMPDIR
- Finally, in the code, we have to set `specs.path_out` to a file in a directory(`tmp_data` mentioned in the Dockerfile that is present in the root folder) that is pointed by `TMPDIR` of the Dockerfile.
**Few questions**
- Do we have to make sure that `tmp_data` is always present in the root folder??
> We would need to catch the case where, for some reason, TMPDIR isn't set and probably document somewhere that TMPDIR needs to be defined and writable (in case people don't make use of the FOCA image or manually unset or modify TMPDIR).
- To handle the above case you mentioned, will updating code only in [register_openapi.py](https://github.com/elixir-cloud-aai/foca/blob/dev/foca/api/register_openapi.py#L91) be sufficient?
- I am also a little confused about how this pointing thing will work. Will it work normally like how 'COPY` works in DOCKER?
Please correct me at places where I am wrong. Also apologies for not understanding the problem correctly or solution is misunderstood by me :)
Hi @Rahuljagwani:
COPY
and ENV
instructions work in Dockerfile
s. Your code snipped looks good at first glance, but I guess the best is to try it out.tmp_dir
inside /tmp
- it feels kinda redundant. However, in case we end up using /tmp
for other files, it might be useful to keep a little order in there. But perhaps pick a more descriptive name for a subdirectory then, maybe specs/
or, if you wanna keep it more generic, app/
./tmp/tmp_dir
or whatever you end up calling it) should always be present: I don't think we can ensure that it will always be there. If someone intentionally or inadvertently remove it, manually or through some piece of code, it will be gone - we can't fully guard against that. I think if we make sure that the directory is created, readable/writable and $TMPDIR is set, I guess we have done our due diligence. What I meant by my comment is that in the app, we should put any code that accesses $TMPDIR in a try
block, and catch
I/O errors with an explicit error message before shutting down the service (because I suppose we can't really recover from being unable to write out the modified specs).Dockerfile
and the code in register_openapi.py
should be sufficient. And of course the corresponding unit tests must be adapted (and tests added for situations where $TMPDIR is unset, points to an unavailable directory or points to a directory that is not readable or writable). I think that should be it, but I may be forgetting something.Not really sure what you mean by "this pointing thing". In pinciple, what we need is set an environment variable. That is pretty standard practice. You create a pathlib.Path
out of it, construct your filename and try writing. Something like this:
import os
from pathlib import Path
specs_mod = ... # put code to construct filename of modified specs file
spec.path_out = Path(os.environ["TMPDIR"]) / "specs" / specs_mod
try:
# put code to write/touch file
catch IOError as exc:
# put code here to exit app with a critical, descriptive error message
You could also use os.environ.get("TMPDIR", "/tmp") instead, which _might_ work even if $TMPDIR is not set, assuming
/tmp/specs` is available and writable.
Hope this helps :)
Thanks a lot, @uniqueg for clarification. Actually, after carefully reading what you wrote in the previous comment, I changed the following code:
New Dockerfile- (Added just 2 lines)
#...prev code
# Install Python dependencies
COPY requirements.txt ./
RUN pip install \
--no-warn-script-location \
--prefix="/install" \
-r requirements.txt
ENV TMPDIR=specs
RUN mkdir -p ${TMPDIR} && chmod 1777 ${TMPDIR}
# Install FOCA
COPY setup.py README.md ./
COPY foca/ ./foca/
RUN pip install . \
--no-warn-script-location \
--prefix="/install"
#...code continues
As you mentioned I kept specs
as a more reasonable name for the TMPDIR
, and a simple mkdir
command is executed.
I successfully built an image from this Dockerfile locally.
Change in register_openapi.py-
# Write modified specs
try:
output_path = Path(spec.path_out)
root_dir = output_path.parent
file_name = output_path.name
temp_dir = Path(os.environ.get('TMPDIR', 'specs'))
modified_dir = root_dir / temp_dir
modified_dir.mkdir(parents=True, exist_ok=True)
spec.path_out = modified_dir / file_name
with open(spec.path_out, 'w') as out_file: # type: ignore
yaml.safe_dump(spec_parsed, out_file)
except OSError as e:
raise OSError(
"Modified specification could not be written to file "
f"'{spec.path_out}'"
) from e
except yaml.YAMLError as e:
raise yaml.YAMLError(
"Could not encode modified specification"
) from e
logger.debug(f"Wrote specs to file: {spec.path_out}")
The above code forms a new directory named specs
at the parent directory of spec.path_out
. Everything is working fine except for one problem that I am facing which is a unit test error: Test
I think it is because when I run tests, a new directory is formed inside test_files named does/not/specs/exists.yaml
, but the test expects that when path_out
does not exist, an exception should be there because of simple try-catch
block in register_openapi.py which throws an OsException
when executed unlike my code which makes a directory named specs/
at the location of path_out
.
What would be your call about this? If you want I can make a demo PR for more clarity on the code, but that particular unit test will fail.
Very nice - thanks a lot @Rahuljagwani!
As for the test: We use automated testing to guard against code regression. In particular, we want to make sure that, when changing the codebase, the app still behaves as expected. However, if we actually want the behavior of our app to change (like in this case), unit or integration tests may well be expected to fail. In such cases, we should update the tests to reflect the new expected behavior, as well as add new tests, if applicable.
So please prepare the PR according to the changes you made, but be sure to update (or remove) the failed test, and instead provide new tests for the new behavior (see my previous comment on what conditions to test for).
Some advice on this: At the very least, we should have all the new statements in the code covered by tests. However, code coverage in itself is not the main thing to consider. Instead, we should reason about our code and think of the common use cases, as well as any common edge cases, and then write tests to account for these. If we do that thoroughly, all code statements will naturally be covered by one or more tests in the process.
In case you have problems writing the tests: I suggest you do as much as you can and raise a PR. It's much easier to comment and discuss in a code review than here in this issue :)
FOCA allows some modification of the API specs it processes. The modified specs are then written back to the location where the original specs were found, with some filename modification (in fact, for reasons of consistency, files are created even if the specs aren't modified at all).
This pattern creates issues for deployment, because the directory where spec files reside need to be writable. It might be better to (1) avoid storing the modified specs on disk at all, if possible, or (2) store them at a writable location like $TMPDIR.