Closed kcris closed 3 years ago
Hi, Thanks for letting us know about minor typos in the python sample notebook. There is a config.json in https://github.com/Azure/spark-cdm-connector/tree/master/samples/Contacts.
Please read - https://github.com/Azure/spark-cdm-connector/blob/master/documentation/overview.md#explicit-write-options. configPath
should be an absolute path eg: - /<conatinername>/<foldername_ where_config.json_resides>
Thanks!
I think I have one more problem though: there is a missing dependency:
CustomerCategory.cdm.json
schema is not part of the samples/Contacts
folder (it is referenced by _salesimports.cdm.json
)
Can you try again. I'm not sure how we missed that.
Thanks!
my next issue is that TrackedEntity is not found.
As a note, I copied the whole contents of samples/Contacts
to my datalake container, which includes a TrackedEntity.cdm.json
.
: java.util.concurrent.ExecutionException: java.lang.Exception: PersistenceLayer | Could not read '/TrackedEntity.cdm.json' from the 'core' namespace. Reason 'com.microsoft.commondatamodel.objectmodel.storage.StorageAdapterException: Could not read ADLS content at path: /TrackedEntity.cdm.json' | loadDocumentFromPathAsync
"corpusPath": "core:/TrackedEntity.cdm.json"
{
"defaultNamespace" : "adls",
"adapters" : [
{
"type" : "adls",
"config" : {
"hostname" : "srichetastorage.dfs.core.windows.net",
"root" : "/outputsubmanifest/example-public-standards",
"tenant" : "72f988bf-86f1-41af-91ab-2d7cd011db47",
"clientId" : "6c3f525f-bdcb-4677-bed6-24f0b43add13",
"timeout" : 5000,
"maximumTimeout" : 20000,
"numberOfRetries" : 2
},
"namespace" : "core"
}
]
}
so I am not sure what is the problem.
storageAccountName = "<mystorage>.dfs.core.windows.net"
container = "wwi-02"
outputContainer = "wwi-02"
(customerdf.write.format("com.microsoft.cdm")
.option("storage", storageAccountName)
.option("manifestPath", outputContainer + "/test/cdm/customer/default.manifest.cdm.json")
.option("entity", "TestEntity")
.option("entityDefinitionModelRoot", container + "/test/cdm/Models") # fetches Config.json from this location and finds definition of "core" alias, if configPath option is not present
.option("entityDefinitionPath", "/Contacts/Customer.cdm.json/Customer") # Customer.cdm.json has an alias - "core"
.option("configPath", container + "/test/cdm/Models/Contacts") # Add your Config.json to override the above definition
.option("entityDefinitionStorage", storageAccountName) # entityDefinitionModelRoot contains in this storage account
.option("format", "parquet")
.save())
The core
alias inside config.json points to
srichetastorage.dfs.core.windows.net/outputsubmanifest/example-public-standards
.
so I guess that, when overriding config, that's where TrackedEntity.cdm.json
is being looked up.
Is this the source of the problem?
As a note: there is a local copy of TrackedEntity.cdm.json
too.
Thank you!
You need to change the location as per your needs, the location where TrackedEntity is placed.
"config" : {
"hostname" : "srichetastorage.dfs.core.windows.net",
"root" : "/outputsubmanifest/example-public-standards",
"tenant" : "72f988bf-86f1-41af-91ab-2d7cd011db47",
"clientId" : "6c3f525f-bdcb-4677-bed6-24f0b43add13",
"timeout" : 5000,
"maximumTimeout" : 20000,
"numberOfRetries" : 2
},
The TrackedEntity.cdm.json inturn has
"imports": [
{
"corpusPath": "cdm:/foundations.cdm.json"
}
],
You need the CDM foundation files to get this working. The sample file just tells you how to use the options.
got it, thanks a lot
Hi, I tried to execute sample [5] (
Overriding from configPath
) available here: https://github.com/Azure/spark-cdm-connector/blob/master/samples/SparkCDMsamplePython.ipynbthese lines have some issues
based on previous examples I think
.option("entityDefinitionModelRoot", "Models")
should really be.option("entityDefinitionModelRoot", container+"Models")
there is a missing comma here
.option("configPath" "/config")
python comment should start with
#
here:// entityDefinitionModelRoot contains in this storage account
All these are minor issues. Once fixed, the main problem was that:
Config.json
inside/config
. It is not clear if/config
is relative toentityDefinitionModelRoot
, but if that's the case, theConfig.json
was present when I tested and yet it was not found. So I was unable to run this sample. I tried bothConfig.json
(according to error messages) andconfig.json
(according to provided sample)Please take a look