Open alexszym opened 2 years ago
@cloga is anyone working on this issue. I have experience working with both SDK versions and would like to contribute by adding more comprehensive examples to the documentation. In my experience, I've found that the learning curve for the V2 SDK is a bit steep. For instance, in V1, to upload some files using datastore, we can use the upload_files_to_datastore()
method, but in V2, we need to use azure.storage.blob and BlobServiceClient to directly upload to blob container, which took some time to figure out. I suggest creating notebooks that compare how to perform tasks in each version to help users transition more easily. What are your thoughts on this?
@alainli0928 for help.
@alexszym I would assume you've used 'append_row' output of your ParallelRunStep as the input of your 2nd transformation PyScriptStep. If so, current V1 SDK doesn't support custom output file format. But we knew that users are looking for this capability. In the following release of V2 SDK, we will support new attributes of parallel job to let user: 1) custom append row output format, 2) predefined append row output headers. But please note, all new features of parallel job don't plan to apply to v1 SDK by default. So please consider to try out our V2 experience. Here is our V2 SDK example repository: https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/parallel
Using ParallelRunStep output as an input to another step
This is specific to Python SDK.
I'm attempting to use ParallelRunStep output as an input to another step which I haven't been able to see an example of anywhere. My use case is simple, I wish to save the output of the pipeline with some additional transforms as a csv.
The closest example I've been able to find is here: https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-pipeline-batch-scoring-classification#download-and-review-output, but it has a bug, the delimiter for the "parallel_run_step.txt" is a whitespace not a colon.
Here is code that I managed to get working eventually in my transform.py that's run after ParallelRunStep
transform.py
PythonScriptStep
Would be great to provide a similar example in the documentation and fix the issue with the wrong delimiter in the already provided examples