erikrikarddaniel / pf-gtdb-analyses

Analysis tools for Pfitmap/RNRdb/GTDB
MIT License
0 stars 1 forks source link

Makefile with downloads and subsets in the wrong directory! #6

Closed erikrikarddaniel closed 4 years ago

erikrikarddaniel commented 4 years ago

The Makefile that controls downloads feather files and subsets data should be data/Makefile not the root directory one. The all target in the root directory Makefile should just step down into the data directory and call make all there. (The latter is what the root directory Makefile looked like to begin with, so you could just resurrect that one.)

Note also that download of feather and some other files is already present in data/Makefile. (And check the syntax of the rsync command.)

GhadaNOUAIRIA commented 4 years ago

Yeah, that's right. I corrected the main directory's Makefile on Master and added the data/Makefile to a new branch (dev) Now we need to remember to ensure sp_clusters.tsv should be in the SOURCE_DIR, since we are not downloading it (in the Makefile) nor nextflow is publishing it in resutls/

erikrikarddaniel commented 4 years ago

Yeah, that's right. I corrected the main directory's Makefile on Master and added the data/Makefile to a new branch (dev)

Better to have consistent Makefiles in the same branch.

Now we need to remember to ensure sp_clusters.tsv should be in the SOURCE_DIR, since we are not downloading it (in the Makefile) nor nextflow is publishing it in resutls/

  1. Do we need it? There's a field for this in the gtdb_metadata.tsv and, IIRC, this field is included in the taxonomy feather file.

  2. If we do, it should be downloaded from its original source with, e.g., wget.

GhadaNOUAIRIA commented 4 years ago
  1. Do we need it? There's a field for this in the gtdb_metadata.tsv and, IIRC, this field is included in the taxonomy feather file.

If we will use gtdb_metadata.tsv to identify representatives then I need to change the R script for that, completely. Alternatively, we can have a make target that extacts genome_accnos corresponding to the representatives from gtdb_metadata.tsv and writes them to a tsv file that will be fed to subset_representatives.R script instead of sp_clusters.tsv

In the last version of taxa.feather, I can see the field for representatives. So another alternative is to change subset_representatives.R to subset them immediately from taxa table.

Which alternative do you prefer? (I'll open a new issue for it)

GhadaNOUAIRIA commented 4 years ago

Better to have consistent Makefiles in the same branch

They are consistent in dev, Before I add data/Makefile to master, shall I edit it (on dev) according to what we are doing about sp_clusters.tsv (now we are rsync-ing it from the SOURCE_DIR, but we may not need it at all)?

erikrikarddaniel commented 4 years ago

I think this field is in the taxa feather file now, so we shouldn't need either of the tsv files. However, we might as well keep the download of the metadata file.

GhadaNOUAIRIA commented 4 years ago

Done/