Open trns1997 opened 4 months ago
After looking into vendor/arbiter/arbiter.cpp
, i found that we pass the header as string but i still cannot get the formatting right :'). Here is what the json looks like:
{
"pipeline": [
{
"bounds": "([-10425171.940, -10423171.940], [5164494.710, 5166494.710])",
"filename": "az://<AZURE_STORAGE_ACCOUNT>.blob.core.windows.net/<PATH_TO_EPT>/ept.json",
"header":"{\"az\": \"{\"account\": \"<AZURE_STORAGE_ACCOUNT>\"}\"}",
"type": "readers.ept",
"tag": "readdata"
},
{
"filename": "test.laz",
"inputs": [ "readdata" ],
"tag": "writerslas",
"type": "writers.las"
}
]
}
It does not seem to like the header that i have passing it. Any clues to why?
On the other hand I looked into the code and noticed that i can set the following env variable to by pass the json parsing probleme AZURE_STORAGE_ACCOUNT
and AZURE_SAS_TOKEN
. After setting these I no longer have a problem with the header as i remove it, but get the following error which has left me slightly perplexed:
pdal pipeline test.json --debug
(PDAL Debug) Debugging...
(pdal pipeline readers.ept Debug) 400: <?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidResourceName</Code><Message>The specifed resource name contains invalid characters.
RequestId:<ID>
Time:2024-07-19T16:37:46.6214773Z</Message></Error>
PDAL: readers.ept: Could not read from <AZURE_STORAGE_ACCOUNT>.blob.core.windows.net/<PATH_TO_EPT>/ept.json
Alright i found why i was facing a problem with the header here is what the header should look like in the json so that arbiter can parse the string successfully:
{
"pipeline": [
{
"bounds": "([-10425171.940, -10423171.940], [5164494.710, 5166494.710])",
"filename": "az://<AZURE_STORAGE_ACCOUNT>.blob.core.windows.net/<PATH_TO_EPT>/ept.json",
"header": "{\"az\": \"{\\\"account\\\": \\\"<AZURE_STORAGE_ACCOUNT>\\\", \\\"sas\\\": \\\"<SAS_TOKEN>\\\"}\"}",
"type": "readers.ept",
"tag": "readdata"
},
{
"filename": "test.laz",
"inputs": [ "readdata" ],
"tag": "writerslas",
"type": "writers.las"
}
]
}
Now i am simply facing:
(pdal pipeline readers.ept Debug) 403: <?xml version="1.0" encoding="utf-8"?><Error><Code>AuthenticationFailed</Code><Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
Hi @trns1997,
Don't directly put queries in the filename
. Use the query
option for readers.ept
.
For arbiter
documentation, you can either crawl the code of the different drivers or either make a proposal for documenting this.
@hobu, @connormanning I know there is some arbiter documentation in entwine.io. What could be the best destination for such documentation ?
Hi @trns1997,
Don't directly put queries in the
filename
. Use thequery
option forreaders.ept
.For
arbiter
documentation, you can either crawl the code of the different drivers or either make a proposal for documenting this.@hobu, @connormanning I know there is some arbiter documentation in entwine.io. What could be the best destination for such documentation ?
@gui2dev I am not sure that i follow? Isn't the query
option to be used when i pass an https
to filename. In my case i choose to pass via the az
driver see below:
{
"pipeline": [
{
"bounds": "([-10425171.940, -10423171.940], [5164494.710, 5166494.710])",
"filename": "az://<AZURE_STORAGE_ACCOUNT>.blob.core.windows.net/<PATH_TO_EPT>/ept.json",
"header": "{\"az\": \"{\\\"account\\\": \\\"<AZURE_STORAGE_ACCOUNT>\\\", \\\"sas\\\": \\\"<SAS_TOKEN>\\\"}\"}",
"type": "readers.ept",
"tag": "readdata"
},
{
"filename": "test.laz",
"inputs": [ "readdata" ],
"tag": "writerslas",
"type": "writers.las"
}
]
}
As far as documentation is concerned i have suggested in the following issue https://github.com/connormanning/arbiter/issues/53 to maybe add it to arbiter directly. But i think having an example directly in PDAL is probably worth having, preventing users from searching all over for a simple read using azure / aws storages :).
When using az
schema, you just need to specify the relative path in the storage account of ept.json
file.
In your exemple, that would be az://<PATH_TO_EPT>/ept.json
.
The az
configuration should not be in the header
option as it is reserved for header
that you would forward to your http request.
You can configure the az
driver by using environment variables : AZURE_STORAGE_ACCOUNT
and AZURE_SAS_TOKEN
if you're using SAS, AZURE_STORAGE_ACCESS_KEY
if you're using storage account key.
Another option is to use a configuration file
When using
az
schema, you just need to specify the relative path in the storage account ofept.json
file. In your exemple, that would beaz://<PATH_TO_EPT>/ept.json
. Theaz
configuration should not be in theheader
option as it is reserved forheader
that you would forward to your http request. You can configure theaz
driver by using environment variables :AZURE_STORAGE_ACCOUNT
andAZURE_SAS_TOKEN
if you're using SAS,AZURE_STORAGE_ACCESS_KEY
if you're using storage account key. Another option is to use a configuration file
@gui2dev omg, this is exactly it. Error from my side, i thought i had to specify the blob.core.windows.net
but it is not necessary. Well this is perfect. @hobu @connormanning up regarding the documentation, we're open to create a small explaination for future users on how to use the az driver.
AZURE_STORAGE_ACCOUNT
and AZURE_SAS_TOKEN
test_az_driver.json
file modifying the necessary content:
{
"pipeline": [
{
"bounds": "([xmin, xmax], [ymin, ymax])",
"filename": "az://<PATH_TO_EPT>/ept.json",
"type": "readers.ept",
"tag": "readdata"
},
{
"filename": "test.laz",
"inputs": [ "readdata" ],
"tag": "writerslas",
"type": "writers.las"
}
]
}
Note PATH_TO_EPT
is simply the path to container no need to specifiy blob.core.windows.net
or azure storage account
.
pdal pipeline test_az_driver.json --debug
sp=r&st=2024-03-03T17:30:06Z&se=2024-03-04T01:30:06Z&sip=86.21.251.79&sv=2022-11-02&sr=b&sig=dQkX7R%2BXHrQLP9qiNdS0zMhYNpmQwLW0D86UUrEgGao%3D
. Each element of the query is seperated by &
and everything the precedes =
is the key and what ever follows =
is the associated value. Therefore in our case the query will be the following:
"query":{
"sp": "r",
"st": "2024-03-03T17:30:06Z",
.
.
.
"sig": "dQkX7R%2BXHrQLP9qiNdS0zMhYNpmQwLW0D86UUrEgGao%3D"
},
test_query.json
file modifying the necessary content:
{
"pipeline": [
{
"bounds": "([xmin, xmax], [ymin, ymax])",
"filename": "https://<AZURE_STORAGE_ACCOUNT>.blob.core.windows.net/<PATH_TO_EPT>/ept.json",
"query":{
"sp": "val1",
"st": "val2",
.
.
.
"sig": "valx"
},
"type": "readers.ept",
"tag": "readdata"
},
{
"filename": "test.laz",
"inputs": [ "readdata" ],
"tag": "writerslas",
"type": "writers.las"
}
]
}
pdal pipeline test_query.json --debug
Credits @gui2dev
@gui2dev I have a question which is more or less along the lines of this issue. The readers.ept
has a query
key which allows us to pass the sas token to our query to read ept data. https://pdal.io/en/latest/stages/readers.las.html I noticed that the readers.las
does not have this functionality, which means that i cannot provide "filename": https://<AZURE_STORAGE_ACCOUNT>.blob.core.windows.net/<PATH_TO_LAS>.las"
to read las data from a remote server. Is there a particular to why? Or just a functionality that needs to be developed? Or the policy is to fetch the las file from the remote server before reading it?
@gui2dev I have a question which is more or less along the lines of this issue. The
readers.ept
has aquery
key which allows us to pass the sas token to our query to read ept data. https://pdal.io/en/latest/stages/readers.las.html I noticed that thereaders.las
does not have this functionality, which means that i cannot provide"filename": https://<AZURE_STORAGE_ACCOUNT>.blob.core.windows.net/<PATH_TO_LAS>.las"
to read las data from a remote server. Is there a particular to why? Or just a functionality that needs to be developed? Or the policy is to fetch the las file from the remote server before reading it?
I don't know your use case precisely, but using reader.las
with a remote file will just download it's content to a temporary file then read it. Only one connection. I guess you could try to put your query in the filename to just check if it's supported.
But the best approach is to use the az://
as stated before.
The readers.ept
manages a pool of connections that will fetch the needed content.
I was wondering if you had an example on how to read ept data from an azure blob container which is accessible only via a SAS token.
I have a pipeline that looks like the following (based on https://gist.github.com/hobu/ee22084e24ed7e3c0d10600798a94c31):
Result :
Which is completely expected as i have not provided the reader with the sas token to access the file.
So i tried a very naive approach where i provieded the sas token to the filename directly as shown below:
Result :
Which i am guessing is completely normal as well as it can read the
ept.json
but i think i need to provide the sas token to the request in the pipeline via theheader
key word, but honestly i do not really understand how to format the pipeline for it to work. I found a release note mentioning #3496 that says that azure sas token added thanks to arbiter. If someone could help out with a comprehensive example it would be great. And as a bonus how to go about using the C++ api to read ept files from an azure blob container using pdal.