erikrikarddaniel / pf-gtdb-analyses

Analysis tools for Pfitmap/RNRdb/GTDB
MIT License
0 stars 1 forks source link

Make targets for data subsetting scripts #5

Closed GhadaNOUAIRIA closed 4 years ago

GhadaNOUAIRIA commented 4 years ago

Adding make targets to use R scripts in subsetting gtdb data (to only representatives and/or to only RNR related protiens)

erikrikarddaniel commented 4 years ago

You have used odd names for pull requests. If you use descriptive names it makes life easier for me.

GhadaNOUAIRIA commented 4 years ago

Isn't the dependency of subset target on the target download_data enough? Should it be explicitly dependent on each of the feather files (the subset scripts use almost all feather files)?

erikrikarddaniel commented 4 years ago

Isn't the dependency of subset target on the target download_data enough? Should it be explicitly dependent on each of the feather files (the subset scripts use almost all feather files)?

They need to be dependent on all feather files read by the script.

GhadaNOUAIRIA commented 4 years ago

Subset_reps and subset_rnrs_gtdb are now dependent on the feather files they use.

subset_rnrs_reps is also dependent on the feather files (that should be built by subset_reps). Building will run make targets in the order they come in the Makefile, so when it comes to subset_rnrs_reps it will find the wanted feather files built. But if we want to specifically use subset_rnrs_reps, we should have it dependent on the target subset_reps.

erikrikarddaniel commented 4 years ago

Subset_reps and subset_rnrs_gtdb are now dependent on the feather files they use.

subset_rnrs_reps is also dependent on the feather files (that should be built by subset_reps). Building will run make targets in the order they come in the Makefile, so when it comes to subset_rnrs_reps it will find the wanted feather files built. But if we want to specifically use subset_rnrs_reps, we should have it dependent on the target subset_reps.

Yes and no. If one wanted to run make subset_rnrs_reps and get everything lined up, one would have to do it like you say. But if we, which is what I had in mind, want to run all the subsetting directly in the all recipe it's better to have it dependent on only the feather files. The reason is that if the feather files are up to date when one runs make all, none of the subsetting targets should need to run. (And the reason for all of this is that one make target can only be specified to make one file, not a set of files. This is what's solved with touch, but I think it's better to be explicit here.)

What the dependency of subset_rnrs_reps on output from subset_reps means however is that we should have make subset_rnrs_reps on a separate line after make subset_reps in the all recipe.

GhadaNOUAIRIA commented 4 years ago

So when I write [make x y] it does not mean that they will run in the order they come in? I need [make x make y] or [make x; make y] to ensure the order?

Script was changed according to your last comment.

erikrikarddaniel commented 4 years ago

So when I write [make x y] it does not mean that they will run in the order they come in? I need [make x make y] or [make x; make y] to ensure the order?

It's not the order they're made in, it's when dependencies are evaluated. If you do make x y and make x updates the dependencies for y, it won't be noticed (I think).

Script was changed according to your last comment.

Looks good. (But easy to get confused... ;-))