adebardo commented 3 months ago

Context

The purpose of this ticket is to set up the initial file that will allow xDEM to be executed via a CLI command: xdem configfile.json. Initially, the command will not execute any functionality but will simply display output.

For the Design

[x] Validate the use of TOML
[ ] add a file config/cli_example.toml containing:

path = "path/to/file"

[ ] Add the file xdem_cli.py
[ ] Add the function the argument parser with module argparse
[ ] add main function

def get_parser():
    """
    ArgumentParser for demcompare

    :return: parser
    """
    parser = argparse.ArgumentParser(
        description=("Compare Digital Elevation Models"),
        fromfile_prefix_chars="@",
    )

    parser.add_argument(
        "config",
        metavar="config.toml",
        help=(
            "path to a toml file containing the paths to "
            "input and output files and the algorithm "
            "parameters"
        ),
    )

    parser.add_argument(
        "--version",
        "-v",
        action="version",
        version=f"%(prog)s {demcompare.__version__}",
    )
    return parser

def main():
    """
    Call demcompare's main
    """
    parser = get_parser()
    argcomplete.autocomplete(parser)
    args = parser.parse_args()

    xdem_cli.run(args.config, loglevel=args.loglevel)

if __name__ == "__main__":
    main()

[ ] add imports in setup.py file
[ ] in file xdem_cli_pipeline.py add function run(configfile)

def run(cfg_path: str):  
          # toml case
      config = toml.load(cfg_path)  
          print(config["key"])

[ ] in setup.cfg add

# xdem entry points cli scripts
# xdem : main cli Program
[options.entry_points]
console_scripts =
    xdem = xdem.xdem_cli:main

Les tests

[ ] "Add a test that runs the execution and checks the output string in a file named tests/test_cli.py"

La doc

In folder doc/source/quickstart.md :

[ ] Add a section about the CLI

Estimation

2d

duboise-cnes commented 3 months ago

Ok for me, it is the demcompare simple cli interface (name_of_program_exe configfileinjson.json). Some quick remarks:

I would prefer a dedicated xdem_cli.py explicitely file and so xdem = xdem.xdem_cli:main in the console_scripts for only high level argparse easy part (arguments and call to the xdem pipeline run to be done).
and also change from demcompare historic choice from init.py to xdem_pipeline.py or something explicit like that (glaciohack validation needed of course).

As said this week, I would be watchful on this new xdem (inspired demcompare) run pipeline architecture (for following issues from the story to add cli): demcompare is not perfect because it cannot be easily run only in memory. So I would add constraint to be able to test directly in memory only and not with data structure on disk (which must exist also but with on option when directory_output is set for the pipeline and all of its substeps to define (typically coreg substep, dem_processing substep, stats substep, report....). I would say we will need a face to face meeting for technical convergence on this pipeline at some point before coding too far on the pipeline...

This would ease also testing to not be mandatory on getting generated on disk but directly from in memory object. I think it could be important at the beginning reporting the CLI from demcompare.

Typo: you have kept demcompare in the main doctype --> xdem ;)

adehecq commented 2 months ago

Some comments from my side:

I agree with Emmanuel, we should not name the file xdem.py but something different like xdem/cli/main.py or xdem/xdem_cli.py
for the parser, I now tend to use click, which makes it much easier to add parsers around functions compared to argparse which forces you to duplicate the argument list. It's also easier to add callbacks to check the arguments and create subcommands. An example in a recent repository I worked on is here.
a question: why did you choose the JSON format? I imagine that's the format you're most used to at CNES? Otherwise, I find yaml or toml more readable (much fewer brackets 😉). I don't know JSON very well, but a quick search raises also a few concerns about JSON: it does not support comments (I like to put comments in my config files), support fewer data types (integer/floats/lists). I would say that for xdem users, toml or yaml files would be easier, but I could understand if CNES users are so used to JSON that we would lose them with this format.

adebardo commented 2 months ago

No problem, thanks.

I put here the TOML et Yaml version of a config_file from demcompare to show the diff and discuss about it.

{
  "output_dir": "./test_output_full/",
  "input_ref": {
    "path": "./srtm_ref.tif",
    "zunit": "m"
  },
  "input_sec": {
    "path": "./srtm_blurred_and_shifted.tif",
    "zunit": "m",
    "nodata": -32768
  },
  "coregistration": {
    "method_name": "nuth_kaab_internal",
    "number_of_iterations": 6,
    "estimated_initial_shift_x": 0,
    "estimated_initial_shift_y": 0
  },
  "statistics": {
    "alti-diff": {
      "classification_layers": {
        "Slope0": {
          "type": "slope",
          "ranges": [
            0,
            5,
            10,
            25,
            45
          ]
        }
      },
      "remove_outliers": true
    },
    "alti-diff-slope-norm": {
      "classification_layers": {
        "Slope0": {
          "type": "slope",
          "ranges": [
            0,
            5,
            10,
            25,
            45
          ]
        }
      },
      "remove_outliers": true
    },
    "angular-diff": {
      "classification_layers": {
        "Slope0": {
          "type": "slope",
          "ranges": [
            0,
            5,
            10,
            25,
            45
          ]
        }
      },
      "remove_outliers": true
    },
    "sec-curvature": {},
    "ref-curvature": {},
    "sec": {
      "metrics": [
        {
          "slope-orientation-histogram": {
            "output_plot_path": "./test_output_full/sec-slope-orientation-histogram"
          }
        }
      ]
    },
    "ref": {
      "metrics": [
        {
          "slope-orientation-histogram": {
            "output_plot_path": "./test_output_full/ref-slope-orientation-histogram"
          }
        }
      ]
    }
  },
  "report": "default"
}

output_dir = "./test_output_full/"
report = "default"

[input_ref]
path = "./srtm_ref.tif"
zunit = "m"

[input_sec]
path = "./srtm_blurred_and_shifted.tif"
zunit = "m"
nodata = -32_768

[coregistration]
method_name = "nuth_kaab_internal"
number_of_iterations = 6
estimated_initial_shift_x = 0
estimated_initial_shift_y = 0

[statistics]
sec-curvature = { }
ref-curvature = { }

  [statistics.alti-diff]
  remove_outliers = true

[statistics.alti-diff.classification_layers.Slope0]
type = "slope"
ranges = [ 0, 5, 10, 25, 45 ]

  [statistics.alti-diff-slope-norm]
  remove_outliers = true

[statistics.alti-diff-slope-norm.classification_layers.Slope0]
type = "slope"
ranges = [ 0, 5, 10, 25, 45 ]

  [statistics.angular-diff]
  remove_outliers = true

[statistics.angular-diff.classification_layers.Slope0]
type = "slope"
ranges = [ 0, 5, 10, 25, 45 ]

[[statistics.sec.metrics]]
[statistics.sec.metrics.slope-orientation-histogram]
output_plot_path = "./test_output_full/sec-slope-orientation-histogram"

[[statistics.ref.metrics]]
[statistics.ref.metrics.slope-orientation-histogram]
output_plot_path = "./test_output_full/ref-slope-orientation-histogram"

output_dir: ./test_output_full/
input_ref:
  path: ./srtm_ref.tif
  zunit: m
input_sec:
  path: ./srtm_blurred_and_shifted.tif
  zunit: m
  nodata: -32768
coregistration:
  method_name: nuth_kaab_internal
  number_of_iterations: 6
  estimated_initial_shift_x: 0
  estimated_initial_shift_y: 0
statistics:
  alti-diff:
    classification_layers:
      Slope0:
        type: slope
        ranges:
          - 0
          - 5
          - 10
          - 25
          - 45
    remove_outliers: true
  alti-diff-slope-norm:
    classification_layers:
      Slope0:
        type: slope
        ranges:
          - 0
          - 5
          - 10
          - 25
          - 45
    remove_outliers: true
  angular-diff:
    classification_layers:
      Slope0:
        type: slope
        ranges:
          - 0
          - 5
          - 10
          - 25
          - 45
    remove_outliers: true
  sec-curvature: {}
  ref-curvature: {}
  sec:
    metrics:
      - slope-orientation-histogram:
          output_plot_path: ./test_output_full/sec-slope-orientation-histogram
  ref:
    metrics:
      - slope-orientation-histogram:
          output_plot_path: ./test_output_full/ref-slope-orientation-histogram
report: default

adebardo commented 2 months ago

@duboise-cnes qu'en penses-tu ?

NB : je ne trouve pas d'outil équivalent à json_checker

duboise-cnes commented 2 months ago

Hello,

for the parser, i agree that click is a good tool but only if we plan to have the parameters in the command line directly. For our demcompare cli, it was not necessary because the cli was only "demcompare configfile.json" with only version and loglevel. Click felt like a hammer in this case ;) The question is : do we want to keep a simple cli call to a configuration file or do we want to expose the parameters to the CLI ? I would keep simple as demcompare for now, but open to discussion if the cli has also a lot of parameters.
for the config file, json was used before i arrive on the tool so we kept it. All our 3D tool have a json configuration. But i am open to another approach, I don't think it would be a no go for CNES users.

here a resume of what I think on configuration after some searching on web:

Json is great for a simple standard, universal support (in python directly) : great choice for a simple configuration to maintain. Easy to serialize, deserialize for instance. But not so easy for human and no comments.
YAML for complex possible configuration, with comments available (complex data structure available, ...), so it is possible to have evolutivity to complex configuration. This can be also too complex for simple configuraition with difficulty of indentation for humans, error difficult to detect, ... But very used in many projects
TOML is a compromise between json, YAML. It is more simple than yaml (but has less possibility also) but have a better readibility and usage for users (comments available). The drawback is that it is not so developed compared of JSON and YAML.

My preference would be to keep JSON because of the historic of demcompare (less work and I don't know if it is the core part for users) but if not with readibility and comments arguments, I would prefer YAML more developed than TOML. Anyway, I would be ok with any of the 3 because they all are possible. But the choice is important has an important API principle for end users and once published, always difficult and heavy to change to another approach (documentation, habits, code, ...).

Some other elements:

I have seen dynaconf project (https://www.dynaconf.com) but seems useful for bigger and complex configurations than us.
I think now pydantic is the way to go to validate the data model (seems the reference now) : https://docs.pydantic.dev/latest/

To be discussed and chosen if possible in next meeting.

adehecq commented 2 months ago

Hi @duboise-cnes,

some quick reaction to your comment:

* for the parser, i agree that click is a good tool but only if we plan to have the parameters in the command line directly. For our demcompare cli, it was not necessary because the cli was only "demcompare configfile.json" with only version and loglevel. Click felt like a hammer in this case ;) The question is : do we want to keep a simple cli call to a configuration file or do we want to expose the parameters to the CLI ? I would keep simple as demcompare for now, but open to discussion if the cli has also a lot of parameters.
Ok that's a good point. If we only have one CLI command with one argument, then there is absolutely no need to use click. For now we can rely on argparse and if we need more complex CLI in the future, we will keep click in mind. Note that currently we only have one CLI command in geoutils (geoviewer.py) which can be used to display raster images and which relies on argparse.

My preference would be to keep JSON because of the historic of demcompare (less work and I don't know if it is the core part for users) but if not with readibility and comments arguments, I would prefer YAML more developed than TOML. Anyway, I would be ok with any of the 3 because they all are possible. But the choice is important has an important API principle for end users and once published, always difficult and heavy to change to another approach (documentation, habits, code, ...).

I'm fine with keeping JSON the main choice. However as discussed with @erikmannerfelt, it could be very simple to accept other formats like TOML (Erik had a strong preference for toml compared to yaml because the specs are much simpler) and simply have one line at the beginning that converts the toml file to json format and internally everything is handled in json format. Of course, we don't want to make it a burden on tests and doc. So we could say JSON is the default, but TOML is supported without providing examples (or a single one) and having maybe a single test for TOML files.

rhugonnet commented 2 months ago

Agree for click being overkill! :hammer: :smile:

On Json/TOML/YAML: Thanks for the nice overview @duboise-cnes. I'm also open to any option. I find the readibility aspect important in some instances, so would tend more towards supporting those if possible (but not necessarily being default). From experience on config files for other software, I feel that having comments usually makes it easier to learn from example configs (from colleagues, or from the documentation) for new users.

If there is indeed strict equivalence that can be done by conversion between formats, as mentioned by @adehecq, that'd be amazing to allow users to work on their format of choice. We could then have one example in the documentation that shows the flexibility with the three format types, and then stick to a single format of our choice for the rest of the doc for consistency.

Would there be any drawbacks from this approach of "converting" several types of input files? (I can't think of anything right now, but I don't have enough perspective/expertise to answer this)

duboise-cnes commented 2 months ago

My feeling is that if we keep several way of configuration, it would be complex to deal. The conversion add complexity and is never perfect because of the intrinsic capacity of each format, there is not a strict equivalence. This can be lowered by keeping a simple data model compatible of json, toml and yaml but could limit evolutivity for more complex configuration scenarios in the future. I would be cautious to keep different type (for instance, dynaconf permit to "merge" different format of input (env, yaml, toml, json, ini, ...) but when i have tried it seems not so stable in reality. And again dynaconf is overkill for us i think. 🔨 So, from my experience and feeling, it would be easier and less "risk" to converge to only one, toml if it is the best and some prefer, json to keep historic or yaml if we prefer.
To be discussed 😄 I would prefer to not keep json and converge to one, than keeping a maintenance-heavy choice. My 2cts

rhugonnet commented 2 months ago

OK, good to know! If there is no strict equivalence, I agree it would rapidly complicate things and I would also be in favor to converge to one format only.

GlacioHack / xdem

[POC] add cli step 1 : setup projects #552

Context

For the Design

Les tests

La doc

Estimation