cytomining / profiling-recipe

Image-based Profiling Recipe
BSD 3-Clause "New" or "Revised" License
8 stars 26 forks source link

Updates to instructions #31

Closed niranjchandrasekaran closed 2 years ago

niranjchandrasekaran commented 2 years ago

@ErinWeisbart we should initialize dvc from the data repo and not the recipe repo, right? or does it not matter?

niranjchandrasekaran commented 2 years ago

Also in https://github.com/cytomining/profiling-recipe/blob/master/README.md#push-the-profiles-to-github the instruction says

git add profiles/${BATCH}/*.dvc profiles/*.gitignore

I didn't find profiles/*.gitignore in my directory. Did I do something wrong?

ErinWeisbart commented 2 years ago

we should initialize dvc from the data repo and not the recipe repo, right?

yes, from the data repo. Thanks for catching that.

I didn't find profiles/*.gitignore in my directory.

I think that's because I added the --recursive flag into the command after writing the instructions. Did you happen to note what was output when you ran dvc add profiles/${BATCH} --recursive? Usually when you run dvc add a_single_file it outputs a statement telling you exactly what you need to add to git. Did you find a .gitignore anywhere???

niranjchandrasekaran commented 2 years ago

Did you happen to note what was output when you ran dvc add profiles/${BATCH} --recursive? Usually when you run dvc add a_single_file it outputs a statement telling you exactly what you need to add to git.

I don't remember what the output was.

Did you find a .gitignore anywhere?

There are .gitignore files within each of the /profiles/${BATCH}/<plate>/ folders.

ErinWeisbart commented 2 years ago

Using the --recursive flag it was automatically creating a separate .gitignore for each separate .dvc file it was creating. So it makes sense that rather than one .gitignore being in the parent folder there are many .gitignores in the subfolders.

So the problem is that we need a glob that will get all files ending in .gitignore within the profiles/${BATCH}/<plate> folders so I think that means we need our .gitignore glob to instead be profiles/${BATCH}/*/*.gitignore

We also need to make sure the git add profiles/${BATCH}/*.dvc is getting all the .dvc files. Do we end up with files within the BATCH folder and within PLATE subfolders? If so we can just expand this part to git add profiles/${BATCH}/*.dvc profiles/${BATCH}/*/*.dvc

So if all that's right, then we need our git command to be: git add profiles/${BATCH}/*.dvc profiles/${BATCH}/*/*.dvc profiles/${BATCH}/*/*.gitignore

Does that make sense? Does it grab all the .dvc and .gitignore files? @niranjchandrasekaran can you test it with git add --dry-run --verbose to make sure it gets everything?

niranjchandrasekaran commented 2 years ago

Thanks Erin! That makes sense.

I haven't tested this again, but when I ran the recipe the last time, I noticed that even though the .dvc files are in profiles/<BATCH>/<PLATE>/, the command in your instructions (git add profiles/${BATCH}/*.dvc) seems to correctly add them. I guess git add profiles/*.gitignore is also able to add all the .gitignore files even though they are in profiles/<BATCH>/<PLATE>.

So if all that's right, then we need our git command to be: git add profiles/${BATCH}/*.dvc profiles/${BATCH}/*/*.dvc profiles/${BATCH}/*/*.gitignore

We won't need the first part (profiles/${BATCH}/*.dvc) right, or are there .dvc files expected in profiles/${BATCH}/?

ErinWeisbart commented 2 years ago

Do we ever output profiles directly to a profiles/BATCH folder and not a profiles/BATCH/PLATE subfolder? I had looked at the files generated and got confused about <BATCH>_normalized_feature_select_<LEVEL>.csv.gz and <BATCH>_normalized_feature_select_negcon_<LEVEL>.csv.gz but they are generated into the gct/BATCH folder and not the profiles/BATCH folder and so don't need to be caught with this command.

If there's never profiles directly in the BATCH folder to commit and they're always in PLATE subfolders, then you're right: git add profiles/${BATCH}/*/*.dvc profiles/${BATCH}/*/*.gitignore

niranjchandrasekaran commented 2 years ago

If there's never profiles directly in the BATCH folder to commit and they're always in PLATE subfolders, then you're right: git add profiles/${BATCH}//.dvc profiles/${BATCH}//.gitignore

That's right, there are no profiles in the BATCH folder.