cucapra / pollen

generating hardware accelerators for pangenomic graph queries
MIT License
27 stars 1 forks source link

Add flit #16

Closed susan-garry closed 1 year ago

susan-garry commented 1 year ago

Overview

This creates a command-line interface for generating node depth accelerators, which can be easily extended to generating accelerators for other odgi subcommands such as node degree. It also refactors the file structure to place all node-depth related files into a single directory and facilitate the addition of other odgi subcommands. The Makefile has also been slightly altered to first test the smallest test cases.

I am specifically looking for feedback on the -f flag, which is now unnecessary and probably just creates clutter. Edit: As per Professor Sampson's feedback on the -f flag, it has been removed. Instead, -r and -d now expect an argument. To make my life a bit easier, I also created pollen/argparse_custom.py for custom argparse classes; it currently contains an argparse.Action that stores both a constant and an argument, for easily setting action to run/parse and storing an input file.

CLI

Usage: exine depth [-h] [-n MAX_NODES] [-e MAX_STEPS] [-p MAX_PATHS] [-o OUT] [-a [AUTO_SIZE]] [-g] [-r PARSE-DATA] [-d PARSE-DATA] [-s SUBSET_PATHS] [-x ACCELERATOR] [--pr] [--tmp-dir TMP_DIR]

Relevant optional arguments:     -a [AUTO_SIZE], --auto-size [AUTO_SIZE]     Provide an odgi file that will be used to calculate the hardware     dimensions. If the flag is set with no argument, the argument of --parse-data or --run is     used instead. Specified hardware dimensions take precedence.     -g, --gen Generate an accelerator. Should not be used with --run or --parse-data.     -r ACTION, --run ACTION Run node depth on the .og or .data file. Outputs the node depth table.     Should not be used with --gen or --parser-data.     -d ACTION, --parse-data ACTION Parse the .og file to accelerator input. Should not be used with --gen or --run.     -s SUBSET_PATHS, --subset-paths SUBSET_PATHS     Should only be set if the action is not gen. Specifies a subset of paths     whose node depth to compute.     -x ACCELERATOR, --accelerator ACCELERATOR     Specify a node depth accelerator to run. Should only be set if the --run flag is set.     --pr Print profiling info. Passes the -pr flag to fud if --run is set.     --tmp-dir TMP_DIR Specify a directory to store temporary files in. The files will not be     deleted at the end of execution.

There is currently a separate -f flag to pass a filename if the --run or --parse-data flag is set. This is mostly because in a previous iteration of the CLI, the flag --action=[run/parse/gen] was used to specify which command to execute, so a separate flag was needed to specify the file. The only practical reason for the -f flag that I can think of is to help prevent confusion on whether to pass the accelerator or data file to the -r flag when it is set, but this doesn't seem like a strong enough reason to use

File structure

Flit requires all module-related files to be placed in a single directory named after the module, so all node-depth related files are now located in pollen/depth. pollen/depth/main.py specifies a command-line interface for generating and running node depth accelerators, as well as parsing input for the accelerator, combining the functionality of calyx_depth.py and parse_data.py. pollen/main.py is where the overall command line interface is defined.

The idea is that pollen/depth/main.py combines the functionality of any relevant files in pollen/depth and defines two functions, config_parser and run. config_parser configures a command-line parser to parse input for exine depth, and 'run' takes in the (parsed) command-line input to generate, parse, or run code for node depth. Then pollen/main.py simply calls these functions to create the exine parser. This can be generalized to other odgi subcommands such as degree to prevent pollen/main.py from becoming too cluttered.

sampsyo commented 1 year ago

AWESOME!!!! This is so cool!

Looking now at your explanation of your options, I'm agreeing even more with your suggestion to eliminate -f. That is, it seems like both --run and --parse-data both require a filename to run, right? In that case, it seems very friendly to just make the CLI syntax into --run <FILE> and --parse-data <FILE>, i.e., to have those flags themselves have a filename argument.

Putting everything into a Python package for Flit's sake is a nice consequence here, IMO. (It's no coincidence—Flit was designed to encourage people to follow "best practices" for packaging up Python code!)

susan-garry commented 1 year ago

The -f flag has now been eliminated!

sampsyo commented 1 year ago

Wonderful!!