connormanning / entwine

Entwine - point cloud organization for massive datasets
https://entwine.io
Other
451 stars 128 forks source link

Configuration File vs. CLI #318

Closed jlaura closed 10 months ago

jlaura commented 10 months ago

The following fails to process via just a configuration file, but works via the CLI. I suspect I am misspecifying something?

Failures:

{      
       "input":"/path/to/dirs/*/with/*.laz",  
        "reprojection":{"out":"+proj=cart +a=1737400 +b=1737400"},
        "threads":256,
        "schema": [
            { "name": "X", "type": "uint32" },
            { "name": "Y", "type": "uint32" },
            { "name": "Z", "type": "uint32" },
            { "name": "Intensity", "type": "uint32" }
        ],
        "scale":0.01,
        "output": "/home/jlaura/caldera/kaguyatc_dtms/entwine/test_output"
}

When I run: entwine build -c test.json -I /path/to/dirs/*/with/*.laz I am do not get bounds errors. All LAZ files in my test seem to process and then I get the following: Encountered an error: [json.exception.out_of_range.403] key 'size' not found

I of course removed the "input" line from the config for the CLI test.

Using v3.0.0 installed off Conda-forge.

connormanning commented 10 months ago

There is currently an issue with specifying the schema with the compound type: instead can you try the type/size combo described here?

jlaura commented 10 months ago

I just found #317. Testing now and sorry for the noise on that.

Also, is that size in bytes? I can put a PR in to update the docs if you like.

Any insight on the input key?

connormanning commented 10 months ago

I think so - when you use * on the command line, the command line itself does the wildcard expansion and passes the result (which is a list of real filename paths) to Entwine. When you use the equivalent path as a string within the configuration file, this is treated as being an actual path rather than a wildcard - like the path itself contains the * characters which is of course incorrect. To resolve this type of wildcard shell expansion the way you want, you need to use the command line so it expands for you.

Alternatively you would have to use input: ["whatever/a.laz", "whatever/b.laz", ...] to explicitly list the files in the configuration. I believe the only exception is a trailing /* (or trailing /** for a recursive list), which Entwine will try to expand out. Of course, since you are trying to limit to wildcard *.laz only, this won't work for you. This is because Entwine explicitly supports cloud formats like S3/Azure/etc, which don't tend to natively support intermediate wildcards via their API and only allow you to list directories.

Note that, if you have files other than *.laz that Entwine does not recognize as point cloud files, there should be no issue there as Entwine will simply ignore them. So maybe /path/to/dirs/** would be reasonable for you, depending on your use-case.

Yes, those sizes are in bytes, feel free to PR and I'll merge it.

jlaura commented 10 months ago

Cool - explanation on the expansion makes a ton of sense. I wrongly assumed the expansion was more robust (and I should have no expectation it is!). Closing with a big old thank you.

PR open for the doc update.