Open jamescrowley opened 2 years ago
route to CXP team
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @azureml-github.
Author: | jamescrowley |
---|---|
Assignees: | - |
Labels: | `Service Attention`, `Machine Learning`, `customer-reported`, `feature-request`, `Auto-Assign` |
Milestone: | - |
Hello @jamescrowley .gitignore files are respected by AzureML CLI v2. Can you share your job YAML file and the folder structure of your code folder?
hey @luigiw, sure - info below. Let me know if you need anything else. Many thanks
folder structure:
pipelines
components
drop_band
__pycache__
.gitignore - <-- works here
main.py
test_main.py
drop_band.yml
.gitignore <-- doesn't work here
pipeline.yml
.gitignore <-- doesn't work here
pipeline yaml:
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
display_name: pipeline
experiment_name: default
settings:
default_datastore: azureml:aml_data_bronze
default_compute: azureml:aml-cluster-cpu
inputs:
rgba_input_file:
type: uri_file
mode: ro_mount
outputs:
drop_band_output:
path: azureml://datastores/aml_data_bronze/paths/azureml/${{name}}/drop_band_output/
mode: rw_mount
jobs:
drop_band:
type: command
component: file:./components/drop_band.yml
inputs:
rgba_input_file: ${{parent.inputs.rgba_input_file}}
outputs:
output_folder: ${{parent.outputs.drop_band_output}}
job yaml:
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command
name: drop_band
display_name: drop_band
version: 1
inputs:
rgba_input_file:
type: uri_file
outputs:
output_folder:
type: uri_folder
code: ./drop_band
environment:
conda_file: ../conda.yml
image: continuumio/miniconda3
command: >-
python3 main.py
--i ${{inputs.rgba_input_file}}
--o ${{outputs.output_folder}}
git ignore:
test_main.py
__pycache__
results in UI, showing pycache being uploaded
Hello @jamescrowley, thx for providing detailed info. As you marked in the folder structure, .gitignore files are respected under the code folder, this is the expected behavior.
The reason behind is that AzureML v2 CLI only checks .gitignore files in folders it uploads local files, in this case your code folder. It will not look at .gitignore files in YAML file folders. Code (and other local artifacts) folders and YAML folders can be at different locations, and it's not always possible to join .gitignore files in them.
@luigiw Thanks for the update :) Totally understood re the YAML file and that it could be completely elsewhere in a file hierarchy.
To clarify, my expectation was that from the code folder itself, it would work up the folder structure in order to find .gitignore rules to apply? (especially as there's a clear 'stop' when you hit the root of the git repo?)
@jamescrowley , I see your point, it makes sense to me. I'll circle this back to my team as a backlog item.
Any updates on this issue?
@luigiw +1
+1
Related command az ml job create
Is your feature request related to a problem? Please describe. When running
az ml job create
for a pipeline, folders like__pycache__
are uploaded into the snapshot from every component in the pipeline.These are excluded in a parent directory .gitignore (the same directory the pipeline yaml is defined in), and yet the CLI does not respect these.
There was an issue previously reported here - https://github.com/Azure/azureml-previews/issues/111 - and .amlignore/.gitignore support is mentioned in the docs: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-machine-learning-pipelines#submit-the-pipeline - but it would appear you have to place a .gitignore in the folder of every component?
Describe the solution you'd like For the CLI to respect the .gitignore hierarchy.