Closed erikrikarddaniel closed 4 years ago
Yeah, that's right. I corrected the main directory's Makefile on Master and added the data/Makefile to a new branch (dev) Now we need to remember to ensure sp_clusters.tsv should be in the SOURCE_DIR, since we are not downloading it (in the Makefile) nor nextflow is publishing it in resutls/
Yeah, that's right. I corrected the main directory's Makefile on Master and added the data/Makefile to a new branch (dev)
Better to have consistent Makefiles in the same branch.
Now we need to remember to ensure sp_clusters.tsv should be in the SOURCE_DIR, since we are not downloading it (in the Makefile) nor nextflow is publishing it in resutls/
Do we need it? There's a field for this in the gtdb_metadata.tsv
and, IIRC, this field is included in the taxonomy feather file.
If we do, it should be downloaded from its original source with, e.g., wget
.
- Do we need it? There's a field for this in the gtdb_metadata.tsv and, IIRC, this field is included in the taxonomy feather file.
If we will use gtdb_metadata.tsv to identify representatives then I need to change the R script for that, completely. Alternatively, we can have a make target that extacts genome_accnos corresponding to the representatives from gtdb_metadata.tsv and writes them to a tsv file that will be fed to subset_representatives.R script instead of sp_clusters.tsv
In the last version of taxa.feather, I can see the field for representatives. So another alternative is to change subset_representatives.R to subset them immediately from taxa table.
Which alternative do you prefer? (I'll open a new issue for it)
Better to have consistent Makefiles in the same branch
They are consistent in dev, Before I add data/Makefile to master, shall I edit it (on dev) according to what we are doing about sp_clusters.tsv (now we are rsync-ing it from the SOURCE_DIR, but we may not need it at all)?
I think this field is in the taxa feather file now, so we shouldn't need either of the tsv files. However, we might as well keep the download of the metadata file.
Done/
The
Makefile
that controls downloads feather files and subsets data should bedata/Makefile
not the root directory one. Theall
target in the root directoryMakefile
should just step down into thedata
directory and callmake all
there. (The latter is what the root directoryMakefile
looked like to begin with, so you could just resurrect that one.)Note also that download of feather and some other files is already present in
data/Makefile
. (And check the syntax of thersync
command.)