GlacioHack / xdem

Analysis of digital elevation models (DEMs)
https://xdem.readthedocs.io/
MIT License
142 stars 41 forks source link

[POC] add cli step 1 : setup projects #552

Open adebardo opened 3 months ago

adebardo commented 3 months ago

Context

The purpose of this ticket is to set up the initial file that will allow xDEM to be executed via a CLI command: xdem configfile.json. Initially, the command will not execute any functionality but will simply display output.

For the Design

path = "path/to/file"
def get_parser():
    """
    ArgumentParser for demcompare

    :return: parser
    """
    parser = argparse.ArgumentParser(
        description=("Compare Digital Elevation Models"),
        fromfile_prefix_chars="@",
    )

    parser.add_argument(
        "config",
        metavar="config.toml",
        help=(
            "path to a toml file containing the paths to "
            "input and output files and the algorithm "
            "parameters"
        ),
    )

    parser.add_argument(
        "--version",
        "-v",
        action="version",
        version=f"%(prog)s {demcompare.__version__}",
    )
    return parser

def main():
    """
    Call demcompare's main
    """
    parser = get_parser()
    argcomplete.autocomplete(parser)
    args = parser.parse_args()

    xdem_cli.run(args.config, loglevel=args.loglevel)

if __name__ == "__main__":
    main()
def run(cfg_path: str):  
          # toml case
      config = toml.load(cfg_path)  
          print(config["key"])
# xdem entry points cli scripts
# xdem : main cli Program
[options.entry_points]
console_scripts =
    xdem = xdem.xdem_cli:main

Les tests

La doc

In folder doc/source/quickstart.md :

Estimation

2d

duboise-cnes commented 3 months ago

Ok for me, it is the demcompare simple cli interface (name_of_program_exe configfileinjson.json). Some quick remarks:

As said this week, I would be watchful on this new xdem (inspired demcompare) run pipeline architecture (for following issues from the story to add cli): demcompare is not perfect because it cannot be easily run only in memory. So I would add constraint to be able to test directly in memory only and not with data structure on disk (which must exist also but with on option when directory_output is set for the pipeline and all of its substeps to define (typically coreg substep, dem_processing substep, stats substep, report....). I would say we will need a face to face meeting for technical convergence on this pipeline at some point before coding too far on the pipeline...

This would ease also testing to not be mandatory on getting generated on disk but directly from in memory object. I think it could be important at the beginning reporting the CLI from demcompare.

Typo: you have kept demcompare in the main doctype --> xdem ;)

adehecq commented 2 months ago

Some comments from my side:

adebardo commented 2 months ago

No problem, thanks.

I put here the TOML et Yaml version of a config_file from demcompare to show the diff and discuss about it.

{
  "output_dir": "./test_output_full/",
  "input_ref": {
    "path": "./srtm_ref.tif",
    "zunit": "m"
  },
  "input_sec": {
    "path": "./srtm_blurred_and_shifted.tif",
    "zunit": "m",
    "nodata": -32768
  },
  "coregistration": {
    "method_name": "nuth_kaab_internal",
    "number_of_iterations": 6,
    "estimated_initial_shift_x": 0,
    "estimated_initial_shift_y": 0
  },
  "statistics": {
    "alti-diff": {
      "classification_layers": {
        "Slope0": {
          "type": "slope",
          "ranges": [
            0,
            5,
            10,
            25,
            45
          ]
        }
      },
      "remove_outliers": true
    },
    "alti-diff-slope-norm": {
      "classification_layers": {
        "Slope0": {
          "type": "slope",
          "ranges": [
            0,
            5,
            10,
            25,
            45
          ]
        }
      },
      "remove_outliers": true
    },
    "angular-diff": {
      "classification_layers": {
        "Slope0": {
          "type": "slope",
          "ranges": [
            0,
            5,
            10,
            25,
            45
          ]
        }
      },
      "remove_outliers": true
    },
    "sec-curvature": {},
    "ref-curvature": {},
    "sec": {
      "metrics": [
        {
          "slope-orientation-histogram": {
            "output_plot_path": "./test_output_full/sec-slope-orientation-histogram"
          }
        }
      ]
    },
    "ref": {
      "metrics": [
        {
          "slope-orientation-histogram": {
            "output_plot_path": "./test_output_full/ref-slope-orientation-histogram"
          }
        }
      ]
    }
  },
  "report": "default"
}
output_dir = "./test_output_full/"
report = "default"

[input_ref]
path = "./srtm_ref.tif"
zunit = "m"

[input_sec]
path = "./srtm_blurred_and_shifted.tif"
zunit = "m"
nodata = -32_768

[coregistration]
method_name = "nuth_kaab_internal"
number_of_iterations = 6
estimated_initial_shift_x = 0
estimated_initial_shift_y = 0

[statistics]
sec-curvature = { }
ref-curvature = { }

  [statistics.alti-diff]
  remove_outliers = true

[statistics.alti-diff.classification_layers.Slope0]
type = "slope"
ranges = [ 0, 5, 10, 25, 45 ]

  [statistics.alti-diff-slope-norm]
  remove_outliers = true

[statistics.alti-diff-slope-norm.classification_layers.Slope0]
type = "slope"
ranges = [ 0, 5, 10, 25, 45 ]

  [statistics.angular-diff]
  remove_outliers = true

[statistics.angular-diff.classification_layers.Slope0]
type = "slope"
ranges = [ 0, 5, 10, 25, 45 ]

[[statistics.sec.metrics]]
[statistics.sec.metrics.slope-orientation-histogram]
output_plot_path = "./test_output_full/sec-slope-orientation-histogram"

[[statistics.ref.metrics]]
[statistics.ref.metrics.slope-orientation-histogram]
output_plot_path = "./test_output_full/ref-slope-orientation-histogram"
output_dir: ./test_output_full/
input_ref:
  path: ./srtm_ref.tif
  zunit: m
input_sec:
  path: ./srtm_blurred_and_shifted.tif
  zunit: m
  nodata: -32768
coregistration:
  method_name: nuth_kaab_internal
  number_of_iterations: 6
  estimated_initial_shift_x: 0
  estimated_initial_shift_y: 0
statistics:
  alti-diff:
    classification_layers:
      Slope0:
        type: slope
        ranges:
          - 0
          - 5
          - 10
          - 25
          - 45
    remove_outliers: true
  alti-diff-slope-norm:
    classification_layers:
      Slope0:
        type: slope
        ranges:
          - 0
          - 5
          - 10
          - 25
          - 45
    remove_outliers: true
  angular-diff:
    classification_layers:
      Slope0:
        type: slope
        ranges:
          - 0
          - 5
          - 10
          - 25
          - 45
    remove_outliers: true
  sec-curvature: {}
  ref-curvature: {}
  sec:
    metrics:
      - slope-orientation-histogram:
          output_plot_path: ./test_output_full/sec-slope-orientation-histogram
  ref:
    metrics:
      - slope-orientation-histogram:
          output_plot_path: ./test_output_full/ref-slope-orientation-histogram
report: default
adebardo commented 2 months ago

@duboise-cnes qu'en penses-tu ?

NB : je ne trouve pas d'outil équivalent à json_checker

duboise-cnes commented 2 months ago

Hello,

here a resume of what I think on configuration after some searching on web:

My preference would be to keep JSON because of the historic of demcompare (less work and I don't know if it is the core part for users) but if not with readibility and comments arguments, I would prefer YAML more developed than TOML. Anyway, I would be ok with any of the 3 because they all are possible. But the choice is important has an important API principle for end users and once published, always difficult and heavy to change to another approach (documentation, habits, code, ...).

Some other elements:

To be discussed and chosen if possible in next meeting.

adehecq commented 2 months ago

Hi @duboise-cnes,

some quick reaction to your comment:

* for the parser, i agree that click is a good tool but only if we plan to have the parameters in the command line directly. For our demcompare cli, it was not necessary because the cli was only "demcompare configfile.json" with only version and loglevel. Click felt like a hammer in this case ;) The question is : do we want to keep a simple cli call to a configuration file or do we want to expose the parameters to the CLI ? I would keep simple as demcompare for now, but open to discussion if the cli has also a lot of parameters.

Ok that's a good point. If we only have one CLI command with one argument, then there is absolutely no need to use click. For now we can rely on argparse and if we need more complex CLI in the future, we will keep click in mind. Note that currently we only have one CLI command in geoutils (geoviewer.py) which can be used to display raster images and which relies on argparse.

My preference would be to keep JSON because of the historic of demcompare (less work and I don't know if it is the core part for users) but if not with readibility and comments arguments, I would prefer YAML more developed than TOML. Anyway, I would be ok with any of the 3 because they all are possible. But the choice is important has an important API principle for end users and once published, always difficult and heavy to change to another approach (documentation, habits, code, ...).

I'm fine with keeping JSON the main choice. However as discussed with @erikmannerfelt, it could be very simple to accept other formats like TOML (Erik had a strong preference for toml compared to yaml because the specs are much simpler) and simply have one line at the beginning that converts the toml file to json format and internally everything is handled in json format. Of course, we don't want to make it a burden on tests and doc. So we could say JSON is the default, but TOML is supported without providing examples (or a single one) and having maybe a single test for TOML files.

rhugonnet commented 2 months ago

Agree for click being overkill! :hammer: :smile:

On Json/TOML/YAML: Thanks for the nice overview @duboise-cnes. I'm also open to any option. I find the readibility aspect important in some instances, so would tend more towards supporting those if possible (but not necessarily being default). From experience on config files for other software, I feel that having comments usually makes it easier to learn from example configs (from colleagues, or from the documentation) for new users.

If there is indeed strict equivalence that can be done by conversion between formats, as mentioned by @adehecq, that'd be amazing to allow users to work on their format of choice. We could then have one example in the documentation that shows the flexibility with the three format types, and then stick to a single format of our choice for the rest of the doc for consistency.

Would there be any drawbacks from this approach of "converting" several types of input files? (I can't think of anything right now, but I don't have enough perspective/expertise to answer this)

duboise-cnes commented 2 months ago

My feeling is that if we keep several way of configuration, it would be complex to deal. The conversion add complexity and is never perfect because of the intrinsic capacity of each format, there is not a strict equivalence. This can be lowered by keeping a simple data model compatible of json, toml and yaml but could limit evolutivity for more complex configuration scenarios in the future. I would be cautious to keep different type (for instance, dynaconf permit to "merge" different format of input (env, yaml, toml, json, ini, ...) but when i have tried it seems not so stable in reality. And again dynaconf is overkill for us i think. 🔨 So, from my experience and feeling, it would be easier and less "risk" to converge to only one, toml if it is the best and some prefer, json to keep historic or yaml if we prefer.
To be discussed 😄 I would prefer to not keep json and converge to one, than keeping a maintenance-heavy choice. My 2cts

rhugonnet commented 2 months ago

OK, good to know! If there is no strict equivalence, I agree it would rapidly complicate things and I would also be in favor to converge to one format only.