Closed ymasaoka closed 3 years ago
Hello Reviewers,
I modified the Readme and .ipynb files to add to the content of the pull request I have already sent. Please confirm and merge.
Thanks,
Hello @ymasaoka
We published a new version. Do you want to check it out?
Hello @Rodrigossz ,
Of course I will! If you give me a few days, I will comment on the results I checked here again! Thank you for responding!
Hello @Rodrigossz ,
Sorry I made you wait. I checked the contents. I have some concerns, could you check it?
Insert another dataset, but this time using the MongoSpark connector.
Do you use MongoSpark connector? For the two data inserts, both appear to be using db
of the MongoClient.
Create a collection named HTAP with
a partition key
called item.
The name Shard key
is used on the Cosmos DB Data Explorer. The name Shard key is also used for MongoDB.
I was worried because there is no sense of unity in this part.
Thanks,
Good catches. WIll fix it. TKS
Also, some of the PySpark syntax is incorrect, could you please correct it?
cell 12
# df.groupBy(df.item.string).sum().show() # incorrect
df.groupBy(df['item']).sum().show() # correct
cell 13 & cell 18
# df.printSchema # We can confirm it, but the syntax is not accurate.
df.printSchema() # correct
I was trying out the content right now, but it seems that I got caught in a bug in Azure Synapse Link for Azure Cosmos DB. I have not yet confirmed the part where the schema information of the timestamp of the new content is updated. However, I have confirmed that this part is the same as the previous content, and since I was able to confirm the operation last time, I think that there is no problem. (I just gave feedback to the Azure Synapse Link team.)
I fixed the typos and changed the schema command. It was not wrong as you mentioned, but for sure your suggestion returns a more elegant view. About aggregations, what you wrote that the existing command is incorrect and suggested another one. But that's not true, the existing syntax is the correct one. But based on your mistake, I added an explanation about it.
The PR is here: https://github.com/Azure-Samples/Synapse/pull/58
Hello,
It is presumed that the content of "Let's get the environment ready" described in the following document is incorrect.
Creating requirements.txt as instructed and applying it to the Spark pool does not install pymongo. Also, there was no mention of pymongo in the output result of the libraries described in the document.
I'm asking a similar question to Microsoft Q & A and I'm currently investigating the cause.
Please confirm and investigate.
Thanks,