Vlad-Shcherbina / icfpc2018-tbd

1 stars 0 forks source link

Filename-based APIs should be banned #19

Open Vlad-Shcherbina opened 6 years ago

Vlad-Shcherbina commented 6 years ago

Write this:

def process(input: InputData) -> OutputData:  # good

Don't write this:

def process(input_filename, output_filename):  # bad

Why is this bad?

Inconvenient to compose

Instead of step2(step1(input)) you have

<come up with a name for the temporary file>
step1(input_filenmae, '../mess/../../stuff')
step2('../../mess/../stuff', output_filename>
<do I delete the temporary file now or make more mess?>

Depends on your local filesystem state

To run your program that reads from '../../mess/../../stuff' I have guess what current directory did you use at the time and what do you expect to find in this location. Should it be placed there manually or left by some other program? Will everything break if I run git clean -xdf?

Other things that are, unlike this thing, acceptable

Files as machine->human interface

It is okay to write to a file that is meant to be used by humans. For example, log files. Or a visualizer that produces hundreds of files like FR042.png (in a separate directory). Or a script to prepare submission.

These output files should be in .gitignore.

Reading from committed files

Examples include pyjs_emulator that runs committed js source, or data_files.py that extracts from committed zip archives.

As a general rule, don't commit generated files.

Files as an implementation detail

For example, pyjs writes data and traces to files to pass them to nodejs. High-level interface abstracts this away.

Files as a transparent local cache

An interesting example would be preparing statistics or visualization of data that is expensive to compute. You still want to write

render(compute_stuff(..))

But if it's too painful to wait when tweaking the renderer, you could add caching to compute_stuff().

Careful about stale cache entries, this stuff is annoying to debug.

Even though files are ok for this purpose, consider SQLite, it could be a better fit.

Shared database

If you need parts of the program to communicat across space or time, do it through the database.

Bonus: it works across both space and time. For example, upload_full_problems.py was run by me in the beginning, solver_worker.py was run by many all the time, and make_submission.py was run by manpages in the end. It all just worked.

Another advantage is that the state is visible to everybody, so anybody can troubleshoot.

In addition, arguably, it's harder to make a mess in a relational schema than in a filesystem. There is more structure.


By the way, reading from a file is already implemented and it's not that hard: path.read_bytes(). Prefer orthogonal APIs.

There is also the io module to bridge between file-object-based interfaces and data interfaces.

When having to deal with files for valid reasons, use utils.project_root() so you don't have to worry about current directory and relative paths.