Closed niranjchandrasekaran closed 2 years ago
Also in https://github.com/cytomining/profiling-recipe/blob/master/README.md#push-the-profiles-to-github the instruction says
git add profiles/${BATCH}/*.dvc profiles/*.gitignore
I didn't find profiles/*.gitignore
in my directory. Did I do something wrong?
we should initialize dvc from the data repo and not the recipe repo, right?
yes, from the data repo. Thanks for catching that.
I didn't find profiles/*.gitignore in my directory.
I think that's because I added the --recursive flag into the command after writing the instructions. Did you happen to note what was output when you ran dvc add profiles/${BATCH} --recursive
? Usually when you run dvc add a_single_file
it outputs a statement telling you exactly what you need to add to git. Did you find a .gitignore
anywhere???
Did you happen to note what was output when you ran dvc add profiles/${BATCH} --recursive? Usually when you run dvc add a_single_file it outputs a statement telling you exactly what you need to add to git.
I don't remember what the output was.
Did you find a .gitignore anywhere?
There are .gitignore
files within each of the /profiles/${BATCH}/<plate>/
folders.
Using the --recursive flag it was automatically creating a separate .gitignore for each separate .dvc file it was creating. So it makes sense that rather than one .gitignore being in the parent folder there are many .gitignores in the subfolders.
So the problem is that we need a glob that will get all files ending in .gitignore
within the profiles/${BATCH}/<plate>
folders so I think that means we need our .gitignore glob to instead be profiles/${BATCH}/*/*.gitignore
We also need to make sure the git add profiles/${BATCH}/*.dvc
is getting all the .dvc files. Do we end up with files within the BATCH folder and within PLATE subfolders? If so we can just expand this part to git add profiles/${BATCH}/*.dvc profiles/${BATCH}/*/*.dvc
So if all that's right, then we need our git command to be:
git add profiles/${BATCH}/*.dvc profiles/${BATCH}/*/*.dvc profiles/${BATCH}/*/*.gitignore
Does that make sense? Does it grab all the .dvc and .gitignore files?
@niranjchandrasekaran can you test it with git add --dry-run --verbose
to make sure it gets everything?
Thanks Erin! That makes sense.
I haven't tested this again, but when I ran the recipe the last time, I noticed that even though the .dvc
files are in profiles/<BATCH>/<PLATE>/
, the command in your instructions (git add profiles/${BATCH}/*.dvc
) seems to correctly add them. I guess git add profiles/*.gitignore
is also able to add all the .gitignore
files even though they are in profiles/<BATCH>/<PLATE>
.
So if all that's right, then we need our git command to be:
git add profiles/${BATCH}/*.dvc profiles/${BATCH}/*/*.dvc profiles/${BATCH}/*/*.gitignore
We won't need the first part (profiles/${BATCH}/*.dvc
) right, or are there .dvc
files expected in profiles/${BATCH}/
?
Do we ever output profiles directly to a profiles/BATCH folder and not a profiles/BATCH/PLATE subfolder?
I had looked at the files generated and got confused about <BATCH>_normalized_feature_select_<LEVEL>.csv.gz
and <BATCH>_normalized_feature_select_negcon_<LEVEL>.csv.gz
but they are generated into the gct/BATCH folder and not the profiles/BATCH folder and so don't need to be caught with this command.
If there's never profiles directly in the BATCH folder to commit and they're always in PLATE subfolders, then you're right:
git add profiles/${BATCH}/*/*.dvc profiles/${BATCH}/*/*.gitignore
If there's never profiles directly in the BATCH folder to commit and they're always in PLATE subfolders, then you're right: git add profiles/${BATCH}//.dvc profiles/${BATCH}//.gitignore
That's right, there are no profiles in the BATCH folder.
@ErinWeisbart we should initialize dvc from the data repo and not the recipe repo, right? or does it not matter?